Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • CVMFS (/cvmfs/atlas.cern.ch, /cvmfs/sft.cern.cn, etc.)
  • Access to SLAC networked storages (/nfs, /gpfs, etc.)
  • Large /scratch space for temporary use (please clean up your files from there after your job finishes).
  • Outbound network connectivity.
  • Singularity container on all batch nodes that run CentOS 7 operation system (see below for more info).

The following are examples of using LSF: 

...

SLAC batch resources consist of several generation of hardwares. They are listed at the the shakeholder's priority page. Some of the batch nodes run RHEL 6 operation system, while others run CentOS 7 operation system. Singularity container technology is available on the CentOS 7 batch nodes.

  • To run your job on a RHEL 6 batch node only, use:  bsub -R "select[rhel6]" ...

  • To run your job on a CentOS 7 batch node only, use: bsub -R "select[centos7]" ...

...

  • bsub -n 4 -R 'span[hosts=1]' ... will submit jobs a job requesting 4 core and (4x 4GB =) 16GB RAM), and allocate all 4 cores on one machine (This is what "span[hosts=1]" is for)

Of course, the more resource you ask, the harder to schedule the jobs, and hence the pending time will be longer.

Below is an long example of resource selection in LSF, for you to pick and choose from:

  •  -R "select[ ! preempt && rhel60 & cvmfs && inet && bullet && hname != bullet0030] rusage[scratch=5.0:duration=1440:decay=1, mem=2000:decay=0] span[hosts=1]" 

It requests the job be dispatched to a machine where

  1. Your job won't be preempted by someone else's higher priority job
  2. The machine run RHEL6 operating system (we have "rhel60" and "centos7")
  3. The machine should have CVMFS,
  4. and outbound internet connection (the "inet" key word above)
  5. The machine should be part of the "bullet" cluster (Other clusters we have: fell, hequ, dole, kiso, deft and bubble, all run "rhel60" except the last two, which run "centos7"), and not on host bullet0030
  6. Reserve 5GB of free space under /scratch, and you job will reserve it for 1440 minutes, and the reserved amount will decay linearly from 100% to 0 during this period.
  7. Reserve 2000MB of RAM, no decay (the default)
  8. span[hosts=1] means if you request more than one batch slots (the -n option above), schedule all of them on one machine.

Note:

  • a. For 6 or 7 to work, the machine should have that amount of resource available at the time the job is dispatched.
  • b. CVMFS cache is usually stored under /scratch/cvmfs2_cache. (This is a way to make sure that there are free space for CVMFS cache so your job won't get error when accessing CVMFS)
  • c. Most SLAC batch users doesn't use 6 or 7, even if they do, they can use more (because the amount specified in 6 or 7 are "reserved", not maximum). So after your job started at a machine, something bad can still happen (run out of memory, /scratch etc.) due to the activities on that machine.

 

Please refer to the LSF document to get familiar with the basic usage of LSF.

...