You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

under construction ...

SLAC use IBM LSF (Load Sharing Facility) batch system. Please refer to the LSF document to get familiar with the basic usage of LSF.

LSF resource available to SLAC ATLAS users:

SLAC ATLAS users have their own dedicate LSF queue and resource. They can also use the "general fairshare" queues available to everyone at SLAC.

Dedicated LSF resource for ATLAS users

SLAC ATLAS users can run jobs in a dedicated LSF queue "atlas-t3". The following command show who can use the dedicate LSF resource, and who can add/remove users to the dedicated resource.

$ ypgroup exam -group atlas-t3
Group 'atlas-t3':
	GID:     3104
	Comment: 
	Last modified at Oct 14 00:22:52 2015 by yangw
	Owners:  sch, sudong, young, zengq 
	Members: acukierm, bpn7, laurenat, makagan, osgatlas01, rubbo, zengq, zihaoj

	This is a secondary group.

The above shows the UNIX group "atlas-t3". People in the "Owners" line and add/remove members of this group. People in the "Member" line can run jobs in the dedicate queue. (Owners are not members).

The following is an example job script for users to submit jobs to the atlas-t3 queue:

$ cat job-script.sh 
#!/bin/sh
# run in LSF queue atlas-t3 and run up to 120 minutes (wall time)
#BSUB -q atlas-t3
#BSUB -W 120
#BSUB -R "select[rhel60 && cvmfs && inet] rusage[scratch=5.0, mem=1000:decay=0]"

# create a working directory on batch node's /scratch space
myworkdir=/scratch/`uname -u`$$
mkdir $myworkdir
cd $myworkdir

# run payload
task1 < input_of_task1 > output_of_task1 2>&1 &
task2 < input_of_task2 > output_of_task2 2>&1 &
wait  # wait for the tasks to finish 

# save the output to storage, use either "cp" to copy to NFS spaces, or "xrdcp" to copy to the xrootd spaces
cp myoutput_file /nfs/slac/g/atlas/u02/myoutput_file  
xrdcp myoutput_file root://atlprf01:11094//atlas/local/myoutput_file

# clean up
cd ..
rm -rf $myworkdir

$ bsub < job-script.sh  # submit the job

In the above script, the first two #BSUB directives tell LSF that the batch queue is "atlas-t3" and the wall time limit is 120 minutes. Please always specify a wall time. Otherwise, your jobs will be killed if they exceed the default 30 minutes wall time limit. The third #BSUB directives tell LSF that the job wants to run on RHEL6 platform (rhel60) with cvmfs ("cvmfs") and outbound internet connection ("inet"), and that the job needs up to 5GB of space under /scratch, 1000MB of RAM (these are advises to the LSF scheduler, not caps or limits).

With the two "&" at the end of the task line (task1 and task2), the two tasks run simultaneously. If you want them to run sequentially, remove the two "&".

 

  • No labels