Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleDisk space
  • Your home directory is in weka (/sdf/home/<first letter of your userid>/<your userid>) with 30 GB of space. This space is backed up and is where code, etc., should go. 
  • We have group space at /sdf/group/fermi/:
    • some directories are under /sdf/data/fermi/, but we provide links into the group directory tree for easier access
    • includes shared software, including conda envs for Fermitools and containers for running rhel6 executables
    • Fermi-supplied user (i.e., on top of your home directory) space.
      • You can find it in /sdf/group/fermi/u/<you>. There is a symlink to it, called "fermi-user", in your home directory for convenience.
      • after gpfs is retired in late 2023, this is where your larger user space will be.
    • group space in /sdf/group/fermi/g/ - a one-time copy has been done of all the gpfs g/ directories, under /nfs/farm/g/glast/g/.
    • all of glast afs has been copied to /sdf/group/fermi/a/
    • the nfs u<xx> partitions were copied to /sdf/group/fermi/n/ (including u52 which contains the GlastRelease rhel6 builds)
  • your user/group space on the old clusters is not directly accessible from s3df - it currently needs to be copied over. (this access policy may get reversed soon)
    • We're still providing additional user space from the old cluster, available on request via the slac-helplist mailing list. It is not backed up. This space is natively gpfs.  User directories are available under: /gpfs/slac/fermi/fs2/u/<your_dir>.
    • During the transition, read-only mounts of afs and gpfs are available on the interactive nodes (not batch!).
      • afs is just the normal afs path, eg to your home directory (/afs/slac/u/ ...) - you may need to issue "aklog" to get an afs token.
      • gpfs is /fs/gpfs/slac/fermi/fs2/u/ ...
  • Scratch space:
    • /sdf/scratch/<username_initial>/<username>: quota 100GB/per user. The space is visible on all interactive and batch nodes. Old data will be purged when overall space is needed, even if your usages is under the quota
    • /lscratch: On each batch node, this is a local space.  It is shared by all users. You are encourage to create your own sub-dir when running your job, and clean up your space (to zero) at the end of your job. Debris left behind by jobs will be purged periodically.
Info
titleHandy urls

...

Info
titleSlurm Batch Usage

For generic advice on running in batch, see Running on SLAC Central Linux.  Note that the actual batch system has changed and we have not updated the doc to reflect that. This is advice on copying data to local scratch, etc.

  • LSB_JOBID -> SLURM_JOB_ID
  • scratch space during job execution:
    • at job start, a directory is automatically created on the scratch of the worker: ${LSCRATCH} = /lscratch/${USER}/slurm_job_id_${SLURM_JOB_ID}
    • once all of a user's jobs on a node are completed/exited, their corresponding LSCRATCH directory on that host is deleted.

You need to specify an account and "repo" on your slurm submissions. The repos allow subdivision of our allocation to different uses. There are 4 repos available under the fermi account. The format is "–-account fermi:<repo>" where repo is one of:

  • default (jobs are pre-emptible - if "paying jobs" need slots, pre-emptible jobs will be killed)
  • L1
  • other-pipelines
  • users 

L1 and other-pipelines are restricted to known pipelines. Non-default repos have quality of service (qos) defaulting to normal (non-pre-emptible).

At time of writing, there is no accounting yet. When that is enabled, we'll have to decide how to split up our allocation into the various repos.

S3DF Slurm organizes the different hardware resource type under Slurm partitions. Slurm doesn't have the concept of batch queue. Users can specify the resource their job needs (because, for example a 12-core CPU request can be satisfied by different types of CPUs). The following is an example script that submits a job to Slurm:

#!/bin/bash
#SBATCH --account=fermi:users
##SBATCH --partition=ampere
#SBATCH --job-name=my_first_job
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=4g
#SBATCH --time=0-00:10:00
#SBATCH --gpus a100:1
hostname

Note that the specifying "--gpus a100:1" option is preferred over the specifying "–partition=ampere" (the latter is not needed). If GPU is not requested, you job will not have access to a GPU even if it is landed on a an ampere node.



Info
titleUsing cron

You can run cronjobs in S3DF. Users don't have to worry about token expiration like on AFS. Select one of the iana interactive nodes (and remember which one!) to run on.

Note: crontab does NOT inherit your environment. You'll need to set that up yourself.

Since crontab is per host (no trscrontab), if the node is reinstalled or removed, the crontab will be lost. It's probably best to save your crontab as a file in your home directory so that you can re-add your cronjobs if this happens:

crontab -l > ~/crontab.backup

Then to re-add the jobs back in:

crontab ~/crontab.backup

...