Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
...
Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Jobs for the running current experiment should be submitted to the high priority queues psnehq and psfehq . Multi-core OpenMPI jobs should be run in either the psnehmpiq or psfehmpiq batch queue, see the following section on "Submitting OpenMPI Batch Jobs"running against the Fast Feedback storage layer (FFB). Simulation jobs should be submitted to the low priority queues psfehidle and psfehidle(with idle in the name). CPU intensive jobs, which don't demand high data throughout, should be submitted to the psanacsq queue.
Experimental HallLocation | Queue | Nodes | Data | Comments | Data Connection Speed Throughput (Gbit/s) | Cores | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NEH | psnehq | psana11xx | ana01, ana02 | FFB for AMO, SXR, XPP | Current experiment on FFBJobs <= 6 cores | 40 | 480240 | ||||||
| psnehmpiqpsnehidleq | psana11xx,psana12xx | ana01, ana02 | OpenMPI jobs > 6 cores, preemptable | 40 |
| psnehidle | psana12xx |
| Simulations, preemptable, low priority | 40 | 240 | |
FEH | psfehq | psana13xx | ana11, ana12 | FFB for XCS, CXI, MEC | Current experiment on FFBJobs <= 6 cores | 40 | 480240 | ||||||
| psfehmpiqpsfehidleq | psana13xx,psana14xx | ana11, ana12 | OpenMPI jobs > 6 cores, preemptable | 40 |
| psfehidle | psana14xx
| Simulations, preemptable, low priority | 240 | |||
NEH | psanacsq | psanacs0xx | ALL | CPU intensive, limited data throughput | 1 | 1536 | |||||||
psanacsidleq | psanacs0xx | Simulations, preemptable, low priority | |||||||||||
Building 50 | psanaq | psana12xx, psana14xx | ALL (no FFB) | Primary psana queue | 40 | 480 | |||||||
NEH/FEH | psanacsq | psanacs0xx | ana01, ana02, ana11, ana12 | CPU intensive, limited data throughput | 1 | psanaidleq | psana12xx, psana14xx | Simulations, preemptable, low priority | 1536 |
LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use psexport or psloginthe interactive psana farm. Login first to pslogin
(from SLAC) or to psexport
(from anywhere) and then to psana
. From there you can submit a job with the following command:
...
The RedHat supplied OpenMPI packages are installed on pslogin, psexport and all of the psana batch servers. The system default has been set to the current version as supplied by RedHat.
No Format |
---|
$ mpi-selector --query default:openmpi-1.4-gcc-x86_64 level:system |
Your environment should be set up to use this version (unless you have used RedHat's mpi-selector
script, or your login scripts, to override the default). You can check to see if your PATH
is correct by issuing the commandcommand which mpirun
. Currently, this should return /usr/lib64/openmpi/1.4-gcc/bin/mpirun
. Future updates to the MPI version may change the exact details of this path.
...
The following are examples of how to submit OpenMPI jobs to the PCDS psnehmpiq psanaq batch queue:
No Format |
---|
bsub -q psnehmpiq psanaq-a mympi -n 32 -o ~/output/%J.out ~/bin/hello |
Will submit an OpenMPI job (-a mympi) requesting 32 processors (-n 32) to the psnehmpiq psanaq batch queue (-q psnehmpiqpsanaq).
No Format |
---|
bsub -q psfehmpiqpsanaq -a mympi -n 16 -R "span[ptile=1]" -o ~/output/%J.out ~/bin/hello |
Will submit an OpenMPI job (-a mympi) requesting 16 processors (-n 16) spanned as one processor per host (-R "span[ptile=1]") to the psfehmpiq psanaq batch queue (-q psfehmpiqpsanaq).
No Format |
---|
bsub -q psfehmpiqpsanaq -a mympi -n 12 -R "span[hosts=1]" -o ~/output/%J.out ~/bin/hello |
Will submit an OpenMPI job (-a mympi) requesting 12 processors (-n 12) spanned all on one host (-R "span[hosts=1]") to the psfehmpiq psanaq batch queue (-q psfehmpiqpsanaq).
Report status of all jobs (running, pending, finished, etc) submitted by the current user:
...
Report running or pending jobs for all users in the psnehq psanaq queue:
Code Block |
---|
bjobs -w -u all -q psnehq |
...
Jobs submitted to a high priority queue (eh psnehq) will automatically suspend jobs running on the same hardware, but in the lower priority queues (eg pnehmpiq and psnehidle), the ones with idle in the name. This suspension can take from a few seconds to up to a few minutes and when many small high priority jobs are submitted in rapid succession, the time taken to suspend lower priority jobs affects the performance of the high priority jobs. For this reason there is a mechanism to suspend the lower priority queues for any specified high priority queue. The command, which can be issued by LCLS scientists and PCDS people, is:
...
No Format |
---|
/reg/g/psdm/qcntrl/psniceq psnehq 30 # Suspend idle queues conflicting with psnehq for 30 minutes /reg/g/psdm/qcntrl/psniceq psfehq 40 # Suspend idle queues conflicting with psfehq for 40 minutes /reg/g/psdm/qcntrl/psniceq psanacspsanacsq 8h # Suspend idle queues conflicting with psanacs for 8 hours |