...
SLAC batch resources consist of several generation of hardwares. They are listed at the the shakeholder's priority page. Some of the batch nodes run RHEL 6 operation system, while others run CentOS 7 operation system. Singularity container technology is available on the CentOS 7 batch nodes.
To run your job on a RHEL 6 batch node only, use: bsub -R "select[rhel6]" ...
- To run your job on a CentOS 7 batch node only, use: bsub -R "select[centos7]" ...
...
Of course, the more resource you ask, the harder to schedule the jobs, and hence the pending time will be longer.
Here is a more complex example of selection resource for you to pick and choose from:
- -R "select[ ! preempt && rhel60 & cvmfs && inet && bullet] rusage[scratch=5.0:duration=1440:decay=1, mem=2000:decay=0] span[hosts=1]"
It requests the job be dispatched to a machine where
- Your job won't be preempted by someone else's higher priority job
- The machine run RHEL6 operating system (we have "rhel60" and "centos7")
- The machine should have CVMFS,
- and outbound internet connection (the "inet" key word above)
- The machine should be part of the "bullet" cluster (Other clusters we have: fell, hequ, dole, kiso, deft and bubble, all run "rhel60" except the last two, which run "centos7")
- Reserve 5GB of free space under /scratch, and you job will reserve it for 1440 minutes, and the reserved amount will decay linearly from 100% to 0 during this period.
- Reserve 2000MB of RAM, no decay (the default)
- span[hosts=1] means if you request more than one batch slots (the -n option above), schedule all of them on one machine.
Note:
- a. For 6 or 7 to work, the machine should have that amount of resource available at the time the job is dispatched.
- b. CVMFS cache is usually stored under /scratch/cvmfs2_cache. (This is a way to make sure that there are free space for CVMFS cache so your job won't get error when accessing CVMFS)
- c. Most SLAC batch users doesn't use 6 or 7, even if they do, they can use more (because the amount specified in 6 or 7 are "reserved", not maximum). So after your job started at a machine, something bad can still happen (run out of memory, /scratch etc.) due to the activities on that machine.
Please refer to the LSF document to get familiar with the basic usage of LSF.
...