You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Overview

All SLAC users can run parallel jobs on the shared "bullet" shared cluster. It has 5024 cores. The hardware is configured as follows:   

  • RHEL6 64bit OS x86 nodes
  • 2.2GHz Sandy Bridge CPUs
  • 16 cores per node
  • 64GB RAM per node
  • QDR (40Gb) Infiniband for MPI comms
  • 10Gb ethernet for SLAC networking

There are 2 public queues for MPI computing on the bullet cluster, bulletmpi and bulletmpi-large. They are available to anyone with a SLAC unix account. Please send email to unix-admin to request access to these queues.

  • bulletmpi for jobs between 8 and 512 cores
  • bulletmpi-large for jobs between 513 and 2048 cores

Queuemin. # coresmax. # coresdefault runtimemax. runtime
bulletmpi851215 mins7 days
bulletmpi-large513204815 mins1 day

Single slot jobs are not allowed in these queues. There is also a limit on the total number of cores (slots) in use by the bulletmpi and bulletmpi-large queues. You can check the current slot usage and the slot limits by running the blimits command. The output below shows the combined slot total for bulletmpi and bulletmpi-large is limited to 3072 slots. All 3072 slots are in use:

renata@victoria $ blimits -w

INTERNAL RESOURCE LIMITS:

NAME                         USERS       QUEUES                        HOSTS         PROJECTS      SLOTS        MEM TMP SWP JOBS 
bulletmpi_total_limit            -       bulletmpi bulletmpi-large bulletfarm/      -       3072/3072       - -   -   -
bulletmpi_slot_limit         hezaveh    bulletmpi                        -              -         288/512        -  -   -    -
bulletmpi_slot_limit         lehmann     bulletmpi                        -              -         128/512        - - - -
bulletmpi_slot_limit         sforeman    bulletmpi                        -              -         256/512        - - - -
bulletmpi_slot_limit         frubio      bulletmpi                       -              -          32/512        - - - -
bulletmpi_slot_limit         weast       bulletmpi                        -              -         300/512        - - - -
bulletmpi_slot_limit         cuoco       bulletmpi                       -              -          20/512        - - - -
bulletmpi_long_slot_limit shoeche    bulletmpi-large                  -             -       2048/2048      - - - -


There are 2 flavors of mpi available at SLAC, the stock RedHat rpm version, and the RedHat version compiled with LSF hooks.   They each use a slightly different bsub command to submit jobs which is described below.

Stock MPI

This version of mpi is the default on most of the public login machines.  You can tell that this is the version that you will be running if you get the following response:

renata@rhel6-64f $  which mpirun
/usr/lib64/openmpi/bin/mpirun

A busb command which uses this mpi should look like:

bsub -q bulletmpi -a mympi -n <# cores>  <mpi job>

MPI with LSF hooks

This version of mpi is available to bsub when you ssh to bullet which will log you in to either bullet0001 or bullet0002 (as described earlier, submit email to unix-admin to request access to the bulletmpi queue(s)).  The response to which mpirun in this case should look like:

/opt/lsf-openmpi/1.5.4/bin//mpirun

You can also check that the version of mpi with LSF hooks is loaded by running:

renata@bullet0002 $ module list
Currently Loaded Modulefiles:
1) lsf-openmpi_1.5.4-x86_64

A bsub command which uses this mpi should look like:

bsub -q bulletmpi mpirun -n <# cores> <mpi job>

 

  • No labels