You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 90 Next »

login Environment

To be able to use the commands to submit batch jobs, add the following 2 lines to your .login file:

source /afs/slac/g/suncat/gpaw/setupenv
setenv PATH ${PATH}:/afs/slac/g/suncat/bin:/usr/local/bin

The first line sets up a default interactive "gpaw-friendly" environment (killing any earlier environment settings!). You could use a similar line to pick up a default "jacapo-friendly" environment, if you prefer. The second line adds some necessary interactive commands (e.g. for submitting batch jobs).

If you want to use a particular version (e.g. 27) of GPAW instead of the "default" above, use something like this instead:

source /nfs/slac/g/suncatfs/sw/gpawv27/setupenv

Note that the contents of .login do NOT affect batch job environment submitted with the various job submission commands described below (e.g. gpaw-bsub, jacapo-bsub, etc.).

Queues

Queue Name

Comment

Wallclock Duration (hours)

suncat-test

16 cores, for quick "does-it-crash" test

10 minutes

suncat-short

 

2

suncat-medium

 

20

suncat-long

 

50

suncat-xlong

Requires JensN/Frank/Felix permission. May have to limit time with -W flag

10 days

There are similar queue names for the suncat2/suncat3 farms.

Farm Information

Farm Name

Cores (or GPUs)

Cores (or GPUs) Per Node

Memory Per Core (or GPU)

Interconnect

suncat

2272 Nehalem X5550

8

3GB

1Gbit Ethernet

suncat2

768 Westmere X5650

12

4GB

2Gbit Ethernet

suncat3

512 Sandy Bridge E5-2670

16

4GB

40Gbit QDR Infiniband

gpu

119 Nvidia M2090

7

6GB

40Gbit QDR Infiniband

Jobs should typically request a multiple of the number of cores per node.

Submitting Jobs

It is important to have an "afs token" before submitting jobs. Check the status with the tokens commands. Renew every 24 hours with /usr/local/bin/kinit command.

Login to a suncat login server (suncatls1,suncatls2,suncatls3, all @slac.stanford.edu) to execute commands like these (notice they are similar for gpaw/dacapo/jacapo):

gpaw-bsub -o mo2n.log -q suncat-long -n 8 mo2n.py
dacapo-bsub -o Al-fcc-single.log -q suncat-long -n 8 Al-fcc-single.py
jacapo-bsub -o Al-fcc-single.log -q suncat-long -n 8 co.py

You can select a particular version of gpaw to run (documented here):

gpaw-ver-bsub 19 -o mo2n.log -q suncat-long -n 8 mo2n.py

You can also embed the job submission flags in your .py file with line(s) like:

#LSF -o mo2n.log -q suncat-long
#LSF -n 8

The job submission scripts use the flags from both the command line and the .py file ("logical or").

Batch Job Output

Because of a file-locking bug in afs, all output from our MPI jobs (GPAW, dacapo, jacapo) should go to nfs. Our fileserver space is at /nfs/slac/g/suncatfs. Make a directory there with your username. You should always use the "/nfs" form of that name (the nfs automounter software often refers to it as "/a", but that syntax should not be in any of your scripts).

Batch Job Environment

The above commands "take control" and set all the environment, preventing the user from changing part of the environment (PATH, PYTHONPATH, etc.). If you want to take that fancier (but more error prone) approach, look at the 2 lines in the gpaw-bsub/dacapo-bsub scripts in /afs/slac/g/suncat/bin, and modify the environment after executing the "setupenv" command, and before executing the "bsub" command.

Useful Commands

Login to a suncat login server (suncatls1,suncatls2,suncatls3) to execute these. You can get more information about these commands from the man pages.

bjobs (shows your current list of batch jobs and jobIds)
bjobs -d (shows list of your recently completed batch jobs)
bqueues suncat-long (shows number of cores pending and running)
bjobs -u all | grep suncat (show jobs of all users in the suncat queues)
bpeek <jobId> (examine logfile output from job that may not have been flushed to disk)
bkill <jobId> (kill job)
btop <jobId> (moves job priority to the top)
bbot <jobId> (moves job priority to the bottom)
bsub -w "ended\(12345\)" (wait for job id 12345 to be EXITed or DONE before running)
bmod [options] <jobId> (modify job parameters after submission, e.g. priority (using -sp flag))
bswitch suncat-xlong 12345 (move running job id 12345 to the suncat-xlong queue)
  • No labels