Table of Contents |
---|
Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the current experiment should be submitted to the high priority queues psnehhiprioq and psfehhiprioq running against the Fast Feedback storage layer (FFB) located at /reg/d/ffb/<hutch>/<experiment>. Jobs for the off-shift experiment should be submitted to psnehprioq and psfehprioq. Only psneh(hi)prioq/psfeh(hi)prioq should access the FFB. When in doubt, use psanaq.
...
Location | Queue | Nodes | Data | Comments | Throughput | Cores | RAM (GB/core) | Default |
---|---|---|---|---|---|---|---|---|
Building 50 | psanaq | psana11xx, psana12xx,psana13xx, psana14xx | ALL (no FFB) | Primary psana queue | 40 | 960 | 24 | 48hrs |
psdebugq | same as psanaq | same as psanaq | SHORT DEBUGGING ONLY (preempts psanaq jobs) | 40 | 24 | 24 | 10min | |
psanaidleq | psana11xx, psana12xx,psana13xx, psana14xx | Jobs preemptable by psanaq | 40 | 960 | 24 | 48hrs | ||
NEH | psnehhiprioq | psana15xx | FFB for AMO, SXR, XPP | Current NEH experiment on FFB ONLY | 40 | 288 | 128 | 24hrs |
psnehprioq | psana15xx | FFB for AMO, SXR, XPP | Off-shift NEH experiment on FFB ONLY | 40 | 288 | 128 | 24hrs | |
| psnehq | psana15xx |
| Jobs preemptable by psneh(hi)prioq | 10 | 288 | 128 | 48hrs |
FEH | psfehhiprioq | psana16xx | FFB for XCS, CXI, MEC | Current FEH experiment on FFB ONLY | 40 | 288 | 128 | 24hrs |
psfehprioq | psana16xx | FFB for XCS, CXI, MEC | Off-shift FEH experiment on FFB ONLY | 40 | 288 | 128 | 24hrs | |
| psfehq | psana16xx |
| Jobs preemptable by psfeh(hi)prioq | 10 | 288 | 128 | 48hrs |
LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use the interactive psana farm. Login first to pslogin
and then to psana
. From there you can submit a job with the following command:
...
No Format |
---|
bsub -W <[hour:]minute> my_program |
...
NOTE: you need have an "mpirun" command in your PATH before issuing the bsub command to submit an MPI job. At LCLS we typically do that with:
...
Code Block |
---|
bsub -q psanaq -n 24 -o ~/output/%J.out mpirun ~/bin/hello |
This will submit an OpenMPI a parallel MPI job requesting 24 processors (-n 24) to the psanaq batch queue (-q psanaq). The environment variable SIT_RELEASE shows you the ana release number.You can still use the -a mympi option for ana 0.18.0 and later, however the recommended method with mpirun should make batch job management more robust (it uses the official LSF batch systems modules).
For advanced users, you can also control how your cores get distributed across computers with the "span" option:
No Format |
---|
bsub -q psanaq -n 12 -R "span[ptile=1]" -o ~/output/%J.out mpirun ~/bin/hello |
Will submit an OpenMPI MPI job requesting 12 processors (-n 12) spanned as one processor per host (-R "span[ptile=1]") to the psanaq batch queue (-q psanaq).
No Format |
---|
bsub -q psanaq -n 12 -R "span[hosts=1]" -o ~/output/%J.out mpirun ~/bin/hello |
Will submit an OpenMPI MPI job requesting 12 processors (-n 12) spanned all on one host (-R "span[hosts=1]") to the psanaq batch queue (-q psanaq).
No Format |
---|
bsub -m "psana1503 psana1509" -q psnehq -n 12 -o ~/output/%J.out mpirun ~/bin/hello |
Will submit an OpenMPI MPI job requesting 12 processors (-n 12) on two nodes (psana1503 and psana1509) to the psnehq batch queue (-q psnehq).
When no ptile is specified in the resource string, the batch system will add "span[ptile=12]". Running MPI jobs on as few hosts as possible default to packing your jobs onto as few nodes as possible. This helps optimize MPI communication between ranks, and minimize job failure due to an error with a host. However it does mean more ranks sharing per host resources, such as memory and I/O. Care is required when managing host resources for your job by specifying your own ptile. If jobs from different users (or the same user) have different ptile settings, the batch system will not run these jobs on the same host, which may lead to under-utilization of the batch queue. For instance, if one user specifies -R "span[ptile=4]" -n 2, taking two ranks on hostA, the system will not put ranks from other user jobs on hostA, unless they also specify span[ptile=4] (in particular the default resource string of [ptile=12] excludes other jobs from hostA.
If you're running psana with MPI, you will get the OpenMPI version associated with the psana release. If you're not running psana, the RedHat supplied OpenMPI packages are installed on pslogin, psexport and all of the psana batch servers. The system default has been set to the current version as supplied by RedHat.
...
.
...
Your environment should be set up to use this version (unless you have used RedHat's mpi-selector
script, or your login scripts, to override the default). You can check to see if your PATH
is correct by issuing the command which mpirun
. Currently, this should return /usr/lib64/openmpi/1.4-gcc/bin/mpirun
. Future updates to the MPI version may change the exact details of this path.
In addition, your LD_LIBRARY_PATH
should include /usr/lib64/openmpi/1.4-gcc/lib
(or something similar).
For notes on compiling examples, please see:
http://www.slac.stanford.edu/comp/unix/farm/mpi.html
Two common categories of non MPI parallel jobs are "embarrassingly parallel" and multi-threaded programs. An embarrassingly parallel program is best managed by using the lsf job arrays feature, a link to SLAC's copy of the lsf documentaiton on this feature is here: SLAC Platform documentation: jobarrays, For example, one could do:
...
For a multi-threaded program, you can reserve some number of cores with the "-n <numcores>" bsub option. This way the batch system knows not to schedule other jobs on those cores. Typically numcores would be set to 12 (psanaq) or 16 (all other queues). The default options for launching jobs is to stack the cores on the same host so one should expect all the cores reserved to be on the same host for your multi-threaded application (one could add the -x for exclusive use of hosts to be sure). Launching non-MPI parallel jobs over multiple compute hosts is possible using the LSF batch system, documentation starts here: How LSF runs Parallel Jobs however our efforts at LCLS are focused on MPI. Efforts to get other frameworks working at LCLS will probably need help from staff here (email pcds-ana-l@slac.stanford.edu).
First command shows the status of the LCLS batch queues (i.e. which queues have available cores). Second command shows the titles of the columns that are output by the first command:
...
See list of recently completed ("done") jobs, typically the last 12 hours:
Code Block |
---|
bjobs -d |
NOTE: This is only permitted for the experiment that currently has beam. You can get an interactive session using one of the nodes in psnehhiprioq/psfehhiprioq by executing the following from a psana node:
...
Remember to logout of all sessions when you are done with them (e.g. when you don't have beam).
LSF has a "fairshare" feature which remembers how much CPU time a particular user has used. This is used to compute a priority which is used to decide which job in the queue is scheduled next. So your job may run first in a queue, even if it was submitted later. You can see your priority number (and those of other users) using "bqueues -r <queuename>" where <queuename> is psanaq, or one of the other LCLS queues.
Guidance for this can be found here.
The following links give more detailed LSF usage information:
...