Compute and Clusters

Overview

The batch system at SLAC uses the IBM Platform Load Sharing Facility, LSF, and is made up of

a general farm of batch servers that is open to all SLAC users
a rhel6 mpi farm which requires that you request access (send email to unix-admin) and run only mpi jobs
various other farms of machines that belong to individual scientific computing groups.

Various scientific groups at SLAC contribute to the purchase of the general farm batch systems and jobs submitted to run in the general farm are scheduled according to a fairshare priority structure. If you are not part of a computing group that has a fairshare, your jobs will run according to the priority of the group called AllUsers. More information about fairshare scheduling at Slac can be found here https://confluence.slac.stanford.edu/display/SCSPub/Fairshare+Scheduling

Submitting Jobs to the General Farm

To submit jobs to the general farm you will need to

get a SLAC unix account: https://slacprod.service-now.com/it_services?id=sc_cat_item&sys_id=17176b676ff12100aae0c6012e3ee4f7&sysparm_category=d65827c46fd921009c4235af1e3ee434
login to a SLAC public machine using ssh. The public login hosts are described here Getting Started.

The simplest command to submit a job to the general farm is:

bsub <your job>

This command will submit your job to the first available general farm machine and will be able to run for 20 minutes of wall clock time. It is recommended to run your job specifying a wall clock time since LSF will use that information to make use of windows of opportunity on systems that are accumulating cores for mpi jobs that are not yet running, and your jobs will schedule more quickly. To specify a wall clock time:

bsub -W <time in minutes> <your job>

bsub argument	requirement	description
-q <queue_name>	optional	Specifies a job submission queue. Not required for running jobs in the general shared clusters.
-R "rhel60" or -R "centos7"	optional	The general queues are a mix of rhel6 and centos7 hosts. Use one of these -R options to restrict your job to the indicated OS.
-W <[hours]:minutes>	required	Wallclock runtime limit that is not normalized for CPU differences. Essential for efficient job scheduling. Jobs will be terminated if they exceed this runlimit
-We <[hours]:minutes>	optional	Wallclock estimated runtime limit that is not normalized for CPU differences. The system will consider this estimate for scheduling purposes. Jobs may be not terminated immediately if they exceed this limit
-c <[hours]:minutes>	optional	cputime limit that is normalized by the CPU factor of the assigned core. Intended to prevent runaway jobs but it is not used to schedule time on cores.

General Queues

Queue	Default Runlimit	Maximun Runlimit	Priority
EXPRESS	4 min	4 min	200
SHORT	20 min	60 min	185
MEDIUM	30 min	2 days	180
LONG	30 min	5 days	175
IDLE	12 hours	12 hours	5

bulletmpi *	15 min	7 days	187
bulletmpi-large *	15 min	1 day	187

*The bulletmpi and bulletmpi-large queues are for mpi jobs only. Access can be requested by email to unix-admin.

General farm batch hosts

Pool/Resource Name	No. in Pool	OS	Hardware	Comment


bullet	302	rhel6-64	Dell PowerEdge M620 dual 8 core 2.2GHz Intel Xeon E5-2660 64GB memory	bullet0001 and 2 are login nodes
kiso	68	centos7	Dell R410 dual hexa-core 2.66GHz Intel Xeon X5650 CPUs 48GB memory
deft	28	centos7	Dell Poweredge M630 quad hexa-core 2.30GHz Xeon E5-2670 CPUs 132GB memory
bubble	19	centos7	Dell Poweredge C6420 dual 18-core 2.70GHz Xeon Gold 6150 CPUs 196GB memory	bubble0001 is a login node 7 are available to the general farm, 12 more are available to the general farm for jobs running 10 minutes or less.

*The pool or resource name can be used to run on a specific set of hosts, i.e. "bsub -R kiso <job>"

Fermi/GLAST and batch

Information about using batch at SLAC for the Fermi/GLAST user can be found here: http://www.slac.stanford.edu/exp/glast/wb/prod/pages/installingOfflineSW/usingSlacBatchFarm.htm

Documentation

There are manpages available for the various batch commands and there is full documentation for LSF here https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_welcome/lsf_welcome.html .

Space shortcuts

Page tree