You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Batch Nodes

Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the current experiment should be submitted to the high priority queues psnehq and psfehq running against the Fast Feedback storage layer (FFB). Simulation jobs should be submitted to the low priority queues with idle in the name. CPU intensive jobs, which don't demand high data throughout, should be submitted to the psanacsq queue. When in doubt, use psanaq.

Location

Queue

Nodes

Data

Comments

Throughput (Gbit/s)Cores
Building 50psanaqpsana11xx, psana14xxALL (no FFB)Primary psana queue40480
 psanaidleqpsana11xx, psana14xx Simulations, preemptable, low priority 480

NEH

psnehq

psana12xx

FFB for AMO, SXR, XPP

Current experiment on FFB

40240

 

psnehidleq

psana12xx

 

Simulations, preemptable, low priority

 240

FEH

psfehq

psana13xx

FFB for XCS, CXI, MEC

Current experiment on FFB

40240

 

psfehidleq

psana13xx

 

Simulations, preemptable, low priority

 240

NEH

psanacsq

psanacs001-048 and psanacs065-128

ALL

CPU intensive, limited data throughput

11792
 psanacsidleqpsanacs001-048 Simulations, preemptable, low priority 768

Submitting Batch Jobs

LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use the interactive psana farm. Login first to pslogin and then to psana. From there you can submit a job with the following command:

bsub -q psnehq -o <output file name> <job_script_command>

For example:

bsub -q psnehq -o ~/output/job.out my_program

This will submit a job (my_program) to the queue psnehq and write its output to a file named ~/output/job.out. You may check on the status of your jobs using the bjobs command.

Similar command:

bsub -q psfehq -o ~/output/log.out "ls -l"

will execute the command line "ls -l" in the batch queue psfehq and write its output to a file named ~/output/log.out

Resource requirements can be specified using the "-R" option. For example, to make sure that a job is run on a node with 1 GB (or more) of available memory, use the following:

bsub -q psnehq -R "rusagemem=1024" my_program
Submitting OpenMPI Batch Jobs

The RedHat supplied OpenMPI packages are installed on pslogin, psexport and all of the psana batch servers. The system default has been set to the current version as supplied by RedHat.

$ mpi-selector --query
default:openmpi-1.4-gcc-x86_64
level:system

Your environment should be set up to use this version (unless you have used RedHat's mpi-selector script, or your login scripts, to override the default). You can check to see if your PATH is correct by issuing the command which mpirun. Currently, this should return /usr/lib64/openmpi/1.4-gcc/bin/mpirun. Future updates to the MPI version may change the exact details of this path.

In addition, your LD_LIBRARY_PATH should include /usr/lib64/openmpi/1.4-gcc/lib (or something similar).

For notes on compiling examples, please see:

http://www.slac.stanford.edu/comp/unix/farm/mpi.html 

The following are examples of how to submit OpenMPI jobs to the PCDS psanaq batch queue:

bsub -q psanaq-a mympi -n 32 -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 32 processors (-n 32) to the psanaq batch queue (-q psanaq).

bsub -q psanaq -a mympi -n 16 -R "span[ptile=1]" -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 16 processors (-n 16) spanned as one processor per host (-R "span[ptile=1]") to the psanaq batch queue (-q psanaq).

bsub -q psanaq -a mympi -n 12 -R "span[hosts=1]" -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 12 processors (-n 12) spanned all on one host (-R "span[hosts=1]") to the psanaq batch queue (-q psanaq).

Common LSF Commands

Report status of all jobs (running, pending, finished, etc) submitted by the current user:

bjobs -w -a

Report only running or pending jobs submitted by user "radmer":

bjobs -w -u radmer

Report running or pending jobs for all users in the psanaq queue:

bjobs -w -u all -q psnehq

Kill a specific batch job based on its job ID number, where the "bjobs" command can be used to find the appropriate job ID (note that only batch administrators can kill jobs belonging to other users).

bkill JOB_ID

Report current node usage on the two NEH batch farms:

bhosts -w ps11farm ps12farm

The following links give more detailed LSF usage information:

PowerPoint presentation describing LSF for LCLS users at SLAC

Batch system in a nutshell

Overview of LSF at SLAC

  • No labels