You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

Batch Nodes

Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the current experiment should be submitted to the high priority queues psnehprioq and psfehprioq running against the Fast Feedback storage layer (FFB) located at /reg/d/ffb/<hutch>/<experiment>. Only psnehprioq/psfehprioq should access the FFB.  When in doubt, use psanaq.

Location

Queue

Nodes

Data

Comments

Throughput (Gbit/s)Cores
Building 50psanaqpsana11xx, psana12xx,psana13xx, psana14xxALL (no FFB)Primary psana queue40960
 psanaidleqpsana11xx, psana12xx,psana13xx, psana14xx
 Jobs preemptable by psanaq40960

NEH

psnehprioq

psana15xx

FFB for AMO, SXR, XPP

Current NEH experiment on FFB ONLY

40288

 

psnehq

psana15xx

 

Jobs preemptable by psnehprioq

10288

FEH

psfehprioq

psana16xx

FFB for XCS, CXI, MEC

Current FEH experiment on FFB ONLY

40288

 

psfehq

psana16xx

 

Jobs preemptable by psfehprioq

10288

Submitting Batch Jobs

LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use the interactive psana farm. Login first to pslogin and then to psana. From there you can submit a job with the following command:

bsub -q psnehq -o <output file name> <job_script_command>

For example:

bsub -q psnehq -o ~/output/job.out my_program

This will submit a job (my_program) to the queue psnehq and write its output to a file named ~/output/job.out. NOTE: the LSF job will inherit whatever environment (PATH, PYTHONPATH, LD_LIBRARY_PATH) you currently have.  This can be useful to avoid writing "wrapper scripts" to setup environment.

You may check on the status of your jobs using the bjobs command.

Similar command:

bsub -q psfehq -o ~/output/log.out "ls -l"

will execute the command line "ls -l" in the batch queue psfehq and write its output to a file named ~/output/log.out

Resource requirements can be specified using the "-R" option. For example, to make sure that a job is run on a node with 1 GB (or more) of available memory, use the following:

bsub -q psnehq -R "rusagemem=1024" my_program
Submitting OpenMPI Batch Jobs

NOTE: you need have an "mpirun" command in your PATH before issuing the bsub command to submit an MPI job.  At LCLS we typically do that with:

source /reg/g/psdm/etc/ana_env.csh    (for tcsh/csh) OR...
source /reg/g/psdm/etc/ana_env.sh     (for bash)

The following are examples of how to submit OpenMPI jobs to the PCDS psanaq batch queue:

bsub -q psanaq -a mympi -n 32 -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 32 processors (-n 32) to the psanaq batch queue (-q psanaq).

bsub -q psanaq -a mympi -n 16 -R "span[ptile=1]" -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 16 processors (-n 16) spanned as one processor per host (-R "span[ptile=1]") to the psanaq batch queue (-q psanaq).

bsub -q psanaq -a mympi -n 12 -R "span[hosts=1]" -o ~/output/%J.out ~/bin/hello

Will submit an OpenMPI job (-a mympi) requesting 12 processors (-n 12) spanned all on one host (-R "span[hosts=1]") to the psanaq batch queue (-q psanaq).

When no ptile is specified in the resource string, the batch system will add "span[ptile=12]". Running MPI jobs on as few hosts as possible helps optimize MPI communication between ranks, and minimize job failure due to an error with a host. However it does mean more ranks sharing per host resources, such as memory and I/O. Care is required when managing host resources for your job by specifying your own ptile. If jobs from different users (or the same user) have different ptile settings, the batch system will not run these jobs on the same host, which may lead to under-utilization of the batch queue. For instance, if one user specifies -R "span[ptile=4]" -n 2, taking two ranks on hostA, the system will not put ranks from other user jobs on hostA, unless they also specify span[ptile=4] (in particular the default resource string of [ptile=12]  excludes other jobs from hostA.

OpenMPI Environment

If you're running psana with MPI, you will get the OpenMPI version associated with the psana release.  If you're not running psana, the RedHat supplied OpenMPI packages are installed on pslogin, psexport and all of the psana batch servers.  The system default has been set to the current version as supplied by RedHat.

$ mpi-selector --query
default:openmpi-1.4-gcc-x86_64
level:system

Your environment should be set up to use this version (unless you have used RedHat's mpi-selector script, or your login scripts, to override the default). You can check to see if your PATH is correct by issuing the command which mpirun. Currently, this should return /usr/lib64/openmpi/1.4-gcc/bin/mpirun. Future updates to the MPI version may change the exact details of this path.

In addition, your LD_LIBRARY_PATH should include /usr/lib64/openmpi/1.4-gcc/lib (or something similar).

For notes on compiling examples, please see:

http://www.slac.stanford.edu/comp/unix/farm/mpi.html

Non-MPI Parallel Jobs

Two common categories of non MPI parallel jobs are "embarrassingly parallel" and multi-threaded programs. An embarrassingly parallel program is best managed by using the lsf job arrays feature, a link to SLAC's copy of the lsf documentaiton on this feature is here: SLAC Platform documentation: jobarrays, For example, one could do:

bsub -q psnehq -J "myArray[1-10]" -o myjobs-%I.out python myscript.py

Note the use of the %I to create separate output files for each of the slots in the job array. Embarrassingly parallel programs need to know which part of the problem they will work on. If you read through the LSF documentation on jobs arrays, you'll see examples that show how to do this be constructing separate stdin input files for each job array slot: Handling Input and Output Files which makes use of the %I expansion for job array slots. The page on Passing Arguments on the Command Line discusses how to make use of LSF environment variables that identify the jobindex, however this is tricky and the example which uses a backslash and passes \$LSB_JOB_INDEX does not work when I submit jobs under the bash shell. These environment variables are not defined until the job is launched on the remote host. The most robust way to access them seems to be to read the environment variables LSB_JOBINDEX and LSB_JOBINDEX_END from within your program rather than to try to construct a command line (however I had some success by enclosing the whole command line in "").

For a multi-threaded program, you can reserve some number of cores with the "-n <numcores>" bsub option.  This way the batch system knows not to schedule other jobs on those cores. Typically numcores would be set to 12 (psanaq) or 16 (all other queues). The default options for launching jobs is to stack the cores on the same host so one should expect all the cores reserved to be on the same host for your multi-threaded application  (one could add the -x for exclusive use of hosts to be sure). Launching non-MPI parallel jobs over multiple compute hosts is possible using the LSF batch system, documentation starts here: How LSF runs Parallel Jobs however our efforts at LCLS are focused on MPI. Efforts to get other frameworks working at LCLS will probably need help from staff here (email pcds-ana-l@slac.stanford.edu).

Common LSF Commands

First command shows the status of the LCLS batch queues (i.e. which queues have available cores).  Second command shows the titles of the columns that are output by the first command:

bqueues | grep ps
bqueues | head -1

Report status of all jobs (running, pending, finished, etc) submitted by the current user:

bjobs -w -a

"Long" format job listing output:

bjobs -l

 

Report only running or pending jobs submitted by user "radmer":

bjobs -w -u radmer

Report running or pending jobs for all users in the psanaq queue:

bjobs -w -u all -q psnehq

Kill a specific batch job based on its job ID number, where the "bjobs" command can be used to find the appropriate job ID (note that only batch administrators can kill jobs belonging to other users).

bkill JOB_ID

Report current node usage on the two NEH batch farms:

bhosts -w ps11farm ps12farm

Additional LSF References

The following links give more detailed LSF usage information:

PowerPoint presentation describing LSF for LCLS users at SLAC

Batch system in a nutshell

Overview of LSF at SLAC

  • No labels