Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Batch Nodes

Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the current experiment should be submitted to the high priority queues psnehprioq and psfehprioq running against the Fast Feedback storage layer (FFB) located at /reg/d/ffb/<hutch>/<experiment>. Only psnehprioq/psfehprioq should access the FFB.  When in doubt, use psanaq.

...

http://www.slac.stanford.edu/comp/unix/farm/mpi.html

Non-MPI Parallel Jobs

Two common categories of non MPI parallel jobs are "embarrassingly parallel" and multi-threaded programs. An embarrassingly parallel program is best managed by using the lsf job arrays feature, a link to SLAC's copy of the lsf documentaiton on this feature is here: SLAC Platform documentation: jobarrays, For example, one could do:

Code Block
bsub -q psnehq -J "myArray[1-10]" -o myjobs-%I.out python myscript.py

Note the use of the %I to create separate output files for each of the slots in the job array. Embarrassingly parallel programs need to know which part of the problem they will work on. If you read through the LSF documentation on jobs arrays, you'll see examples that show how to do this be constructing separate stdin input files for each job array slot: Handling Input and Output Files which makes use of the %I expansion for job array slots. The page on Passing Arguments on the Command Line discusses how to make use of LSF environment variables that identify the jobindex, however this is tricky and the example which uses a backslash and passes \$LSB_JOB_INDEX does not work when I submit jobs under the bash shell. These environment variables are not defined until the job is launched on the remote host. The most robust way to access them seems to be to read the environment variables LSB_JOBINDEX and LSB_JOBINDEX_END from within your program rather than to try to construct a command line (however I had some success by enclosing the whole command line in "").

For a multi-threaded program, you can reserve some number of cores you're running a parallel job without MPI you must still tell the batch system how many cores you are using with the "-n <numcores>" bsub option.  This   This way the batch system knows not to schedule other jobs on those cores.  Typically Typically numcores would be set to 12 (psanaq) or 16 (all other queues) since non-MPI parallelization tends to only work inside of 1 node. The default options for launching jobs is to stack the cores on the same host so one should expect all the cores reserved to be on the same host for your multi-threaded application  (one could add the -x for exclusive use of hosts to be sure). Launching non-MPI parallel jobs over multiple compute hosts is possible using the LSF batch system, documentation starts here: How LSF runs Parallel Jobs however our efforts at LCLS are focused on MPI. Efforts to get other frameworks working at LCLS will probably need help from staff here (email pcds-ana-l@slac.stanford.edu).

Common LSF Commands

First command shows the status of the LCLS batch queues (i.e. which queues have available cores).  Second command shows the titles of the columns that are output by the first command:

...