Page History

In 2021 LCLS switch to the SLURM batch system.

Information on submitting jobs to the SLURM system at LCLS can be found on this page: Submitting SLURM Batch Jobs

Information on the Automatic Run Processing system (ARP) can be found on this page: Automatic Run Processing (ARP). This is also usable at sites like NERSC and SDF.

A "cheat sheet" showing similar commands on LSF and SLURM can be found here: https://slurm.schedmd.com/rosetta.pdf

Refer to the table below for the batch resources available in psana.

Batch Nodes/Queues

Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the current experiment should be submitted to the high priority queues psnehhiprioq and psfehhiprioq running against the Fast Feedback storage layer (FFB) located at /reg/d/ffb/<hutch>/<experiment> as shown HERE. Jobs for the off-shift experiment should be submitted to psnehprioq and psfehprioq. Only psneh(hi)prioq/psfeh(hi)prioq should access the FFB. When in doubt, use psanaq.

Submit your job from an interactive node (where you land after doing ssh psana). LSF will run the submitted job on the specified queue using nodes listed in the table below. All nodes in the queues listed below run rhel7RHEL7. By submitting from an interactive node (, also running rhel7)RHEL7, you will ensure that your job inherits a rhel7 environment.

Location	Queue	Nodes	Data	Comments	Throughput (Gbit/s)	Cores	Cores/Node	RAM (GB/node)	Default Time Limit
Building 50	psanaq	psana11xx, psana12xx,psana13xx, psana14xx	ALL (no FFB)	Primary psana queue	40	960	12	24	48hrs
	psdebugq	same as psanaq	same as psanaq	SHORT DEBUGGING ONLY (preempts psanaq jobs)	40	24	12	24	10min
	psanaidleq	psana11xx, psana12xx,psana13xx, psana14xx		Jobs preemptable by psanaq	40	960	12	24	48hrs
NEH	psnehhiprioq	psana15xx	FFB for AMO, SXR, XPP	Current NEH experiment on FFB ONLY	40	288	16	128	24hrs
	psnehprioq	psana15xx	FFB for AMO, SXR, XPP	Off-shift NEH experiment on FFB ONLY	40	288	16	128	24hrs
	psnehq	psana15xx		Jobs preemptable by psneh(hi)prioq	10	288	16	128	48hrs
FEH	psfehhiprioq	psana16xx	FFB for XCS, CXI, MEC	Current FEH experiment on FFB ONLY	40	288	16	128	24hrs
	psfehprioq	psana16xx	FFB for XCS, CXI, MEC	Off-shift FEH experiment on FFB ONLY	40	288	16	128	24hrs
	psfehq	psana16xx		Jobs preemptable by psfeh(hi)prioq	10	288	16	128	48hrs

Submitting Batch Jobs

LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use the interactive psana farm. Login first to pslogin and then to psana. From there you can submit a job with the following command:

No Format
bsub -q psnehq -o <output file name> <job_script_command>

For example:

No Format
bsub -q psnehq -o ~/output/job.out my_program

This will submit a job (my_program) to the queue psnehq and write its output to a file named ~/output/job.out. NOTE: the LSF job will inherit whatever environment (PATH, PYTHONPATH, LD_LIBRARY_PATH) you currently have. This can be useful to avoid writing "wrapper scripts" to setup environment.

You may check on the status of your jobs using the bjobs command.

Similar command:

No Format
bsub -q psfehq -o ~/output/log.out "ls -l"

will execute the command line "ls -l" in the batch queue psfehq and write its output to a file named ~/output/log.out.

Resource requirements can be specified using the "-R" option. For example, to make sure that a job is run on a node with 1 GB (or more) of available memory, use the following:

No Format
bsub -q psnehq -R "rusage[mem=1024]" my_program

Change the default job time limit:

No Format
bsub -W <[hour:]minute> my_program

Submitting Parallel MPI Batch Jobs

NOTE: you need have an "mpirun" command in your PATH before issuing the bsub command to submit an MPI job. At LCLS we typically do that with:

Code Block
source /reg/g/psdm/etc/psconda.sh (requires bash shell)

The recommended way to submit MPI batch jobs is

Code Block
bsub -q psanaq -n 24 -o ~/output/%J.out mpirun ~/bin/hello

This will submit a parallel MPI job requesting 24 processors (-n 24) to the psanaq batch queue (-q psanaq).

For advanced users, you can also control how your cores get distributed across computers with the "span" option:

No Format
bsub -q psanaq -n 12 -R "span[ptile=1]" -o ~/output/%J.out mpirun ~/bin/hello

Will submit an MPI job requesting 12 processors (-n 12) spanned as one processor per host (-R "span[ptile=1]") to the psanaq batch queue (-q psanaq).

No Format
bsub -q psanaq -n 12 -R "span[hosts=1]" -o ~/output/%J.out mpirun ~/bin/hello

Will submit an MPI job requesting 12 processors (-n 12) spanned all on one host (-R "span[hosts=1]") to the psanaq batch queue (-q psanaq).

No Format
bsub -m "psana1503 psana1509" -q psnehq -n 12 -o ~/output/%J.out mpirun ~/bin/hello

Will submit an MPI job requesting 12 processors (-n 12) on two nodes (psana1503 and psana1509) to the psnehq batch queue (-q psnehq).

When no ptile is specified in the resource string, the batch system will default to packing your jobs onto as few nodes as possible. This helps optimize MPI communication between ranks, and minimize job failure due to an error with a host. However it does mean more ranks sharing per host resources, such as memory and I/O. Care is required when managing host resources for your job by specifying your own ptile. If jobs from different users (or the same user) have different ptile settings, the batch system will not run these jobs on the same host, which may lead to under-utilization of the batch queue. For instance, if one user specifies -R "span[ptile=4]" -n 2, taking two ranks on hostA, the system will not put ranks from other user jobs on hostA, unless they also specify span[ptile=4] (in particular the default resource string of [ptile=12] excludes other jobs from hostA.

Non-MPI Parallel Jobs

Two common categories of non MPI parallel jobs are "embarrassingly parallel" and multi-threaded programs. An embarrassingly parallel program is best managed by using the lsf job arrays feature, a link to SLAC's copy of the lsf documentaiton on this feature is here: SLAC Platform documentation: jobarrays, For example, one could do:

Code Block
bsub -q psnehq -J "myArray[1-10]" -o myjobs-%I.out python myscript.py

Note the use of the %I to create separate output files for each of the slots in the job array. Embarrassingly parallel programs need to know which part of the problem they will work on. If you read through the LSF documentation on jobs arrays, you'll see examples that show how to do this by constructing separate stdin input files for each job array slot: Handling Input and Output Files which makes use of the %I expansion for job array slots. The page on Passing Arguments on the Command Line discusses how to make use of LSF environment variables that identify the jobindex, however this is tricky and the example which uses a backslash and passes \$LSB_JOB_INDEX does not work when I submit jobs under the bash shell. These environment variables are not defined until the job is launched on the remote host. The most robust way to access them seems to be to read the environment variables LSB_JOBINDEX and LSB_JOBINDEX_END from within your program rather than to try to construct a command line (however I had some success by enclosing the whole command line in "").

For a multi-threaded program, you can reserve some number of cores with the "-n <numcores>" bsub option. This way the batch system knows not to schedule other jobs on those cores. Typically numcores would be set to 12 (psanaq) or 16 (all other queues). The default options for launching jobs is to stack the cores on the same host so one should expect all the cores reserved to be on the same host for your multi-threaded application (one could add the -x for exclusive use of hosts to be sure). Launching non-MPI parallel jobs over multiple compute hosts is possible using the LSF batch system, documentation starts here: How LSF runs Parallel Jobs however our efforts at LCLS are focused on MPI. Efforts to get other frameworks working at LCLS will probably need help from staff here (email pcds-ana-l@slac.stanford.edu).

Common LSF Commands

First command shows the status of the LCLS batch queues (i.e. which queues have available cores). Second command shows the titles of the columns that are output by the first command:

Code Block
bqueues \| grep ps bqueues \| head -1

Report status of all jobs (running, pending, finished, etc) submitted by the current user:

Code Block
bjobs -w -a

"Long" format job listing output:

Code Block
bjobs -l

Report only running or pending jobs submitted by user "radmer":

Code Block
bjobs -w -u radmer

Report running or pending jobs for all users in the psanaq queue:

Code Block
bjobs -w -u all -q psnehq

Kill a specific batch job based on its job ID number, where the "bjobs" command can be used to find the appropriate job ID (note that only batch administrators can kill jobs belonging to other users). See below for additional information about hard-to-kill batch jobs:

Code Block
bkill JOB_ID

Report current node usage on the two NEH batch farms:

Code Block
bhosts -w ps11farm ps12farm

See list of recently completed ("done") jobs, typically the last 12 hours:

Code Block
bjobs -d

Getting A High-Priority Interactive Session (When You Have Beam)

NOTE: This is only permitted for the experiment that currently has beam. You can get an interactive session using one of the nodes in psnehhiprioq/psfehhiprioq by executing the following from a psana node:

...

RHEL7 environment.

Note 1: Jobs for the current experiment can be submitted to fast feedback (FFB) queues, which allocate resources for the most recent experiments. The FFB queues in the tables below are for LCLS-II experiments (TMO, RIX and UED). The FEH experiments (LCLS-I, including XPP) can submit FFB jobs to the new Fast Feedback System.

Warning
As of February 2023, the offline compute resources have been consolidated into the psanaq. The priority queues have been removed.

sQueue name	Node names on SLURM queues	Number of Nodes	Comments	Throughput [Gbit/s]	Cores/ Node	RAM [GB/node]	Time limit
psanaq	psana15xx psana16xx	34	Primary psana queue	40	16	128	48hrs
psanagpuq	psanagpu113-psanagpu118	6	GPU nodes	10	16	128	48hrs

Remember to logout of all sessions when you are done with them (e.g. when you don't have beam).

Batch Job Priorities

LSF has a "fairshare" feature which remembers how much CPU time a particular user has used. This is used to compute a priority which is used to decide which job in the queue is scheduled next. So your job may run first in a queue, even if it was submitted later. You can see your priority number (and those of other users) using "bqueues -r <queuename>" where <queuename> is psanaq, or one of the other LCLS queues.

Troubleshooting Batch Job Problems

Guidance for this can be found here.

Additional LSF References

The following links give more detailed LSF usage information:

PowerPoint presentation describing LSF for LCLS users at SLAC

Batch system in a nutshell

...

Child pages

Versions Compared

Old Version 68

New Version Current

Key

Table of Contents