...
LSF (Load Sharing Facility) is the job scheduler used at SLAC to execute user batch jobs on the various batch farms. LSF commands can be run from a number of SLAC servers, but best to use psexport or pslogin. Login first to pslogin
(from SLAC) or to psexport
(from anywhere). From there you can submit a job with the following command:
No Format |
---|
bsub -q psnehq -o <output file name> <job_script_command>
|
For example:
No Format |
---|
bsub -q psnehq -o ~/output/job.out my_program
|
This will submit a job (my_program) to the queue psnehq
and write its output to a file named ~/output/job.out
. You may check on the status of your jobs using the bjobs
command.
Similarcommand:
No Format |
---|
bsub -q psfehq -o ~/output/log.out "ls -l" |
will execute the command line "ls -l"
in the batch queue psfehq
and write its output to a file named ~/output/
log.out
.
Resource requirements can be specified using the "-R" option. For example, to make sure that a job is run on a node with 1 GB (or more) of available memory, use the following:
No Format |
---|
bsub -q psnehq -R "rusagemem=1024" my_program
|
...
The system default has been set to the current version as supplied by RedHat.
No Format |
---|
$ mpi-selector --query
default:openmpi-1.4-gcc-x86_64
level:system
|
...
The following are examples of how to submit OpenMPI jobs to the PCDS psnehmpiq batch queue:
No Format |
---|
bsub -q psnehmpiq -a mympi -n 32 -o ~/output/%J.out ~/bin/hello
|
Will submit an OpenMPI job (-a mympi) requesting 32 processors (-n 32) to the psnehmpiq batch queue (-q psnehmpiq).
No Format |
---|
bsub -q psfehmpiq -a mympi -n 16 -R "span[ptile=1]" -o ~/output/%J.out ~/bin/hello
|
Will submit an OpenMPI job (-a mympi) requesting 16 processors (-n 16) spanned as one processor per host (-R "span[ptile=1]") to the psfehmpiq batch queue (-q psfehmpiq).
No Format |
---|
bsub -q psfehmpiq -a mympi -n 12 -R "span[hosts=1]" -o ~/output/%J.out ~/bin/hello
|
...
Report status of all jobs (running, pending, finished, etc) submitted by the current user:
Code Block |
---|
bjobs -w -a
|
Report only running or pending jobs submitted by user "radmer":
Code Block |
---|
bjobs -w -u radmer
|
Report running or pending jobs for all users in the psnehq queue:
Code Block |
---|
bjobs -w -u all -q psnehq
|
Kill a specific batch job based on its job ID number, where the "bjobs" command can be used to find the appropriate job ID (note that only batch administrators can kill jobs belonging to other users).
Code Block |
---|
bkill JOB_ID
|
Report current node usage on the two NEH batch farms:
Code Block |
---|
bhosts -w ps11farm ps12farm
|
...