Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
...
There are a number of batch farms (i.e. collections of compute nodes) located in the NEH and FEH. Depending on your data access you may need to submit jobs to a specific farm. This is accomplished by submitting to the appropriate LSF batch queue. Refer to the table below. Multi-core OpenMPI jobs should be run in either the psnehmpiq or psfehmpiq batch queue, see the following section on "Submitting OpenMPI Batch Jobs". Simulation jobs should be submitted to the low priority queues psfehidle and psfehidle.
Experimental Hall | Queue | Nodes | Data | Comments |
---|---|---|---|---|
NEH | psnehq | psana11xx,psana12xx | ana01, ana02 | Jobs <= 6 cores |
| psnehmpiq | psana11xx,psana12xx | ana01, ana02 | OpenMPI jobs > 6 cores, preemptable |
| psnehidle | psana11xx,psana12xx |
| Simulations, preemptable, low priority |
FEH | psfehq | psana13xx,psana14xx | ana11, ana12 | Jobs <= 6 cores |
| psfehmpiq | psana13xx,psana14xx | ana11, ana12 | OpenMPI jobs > 6 cores, preemptable |
| psfehidle | psana13xx,psana14xx |
| Simulations, preemptable, low priority |
The batch farms listed above Instructions describing how to submit jobs can be found on the Submitting Batch Job page. The batch farms currently consist of eighty nodes with the following general specifications:
LSF (Load Sharing Facility) is a job scheduler provided by Platform Computing. It is used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use psexport or pslogin. A short list of example LSF status commands follows (see next section for submitting jobs):
Report status of all jobs (running, pending, finished, etc) submitted by the current user:
Code Block |
---|
bjobs -w -a
|
Report only running or pending jobs submitted by user "radmer":
Code Block |
---|
bjobs -w -u radmer
|
Report running or pending jobs for all users in the psnehq queue:
Code Block |
---|
bjobs -w -u all -q psnehq
|
Kill a specific batch job based on its job ID number, where the "bjobs" command can be used to find the appropriate job ID (note that only batch administrators can kill jobs belonging to other users).
Code Block |
---|
bkill JOB_ID
|
Report current node usage on the two NEH batch farms:
Code Block |
---|
bhosts -w ps11farm ps12farm
|
The following links give more detailed LSF usage information:
PowerPoint presentation describing LSF for LCLS users at SLAC
...