You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 42 Next »

Overview

This is an example to run jobs and especially  SLIC , the Simulator for the Linear Collider, on the FermiGrid which is part of the Open Science Grid. SLIC is a Geant4-based simulations package that uses an XML geometry input format called LCDD to describe geometry, sensitive detectors and readout geometry.  In this  example SLIC is tared  up and put on a  web accessible disk space. The grid jobs wgets the tar file unpacks it and runs SLIC on the stdhep files that is provided with the tar package.

This is only one way to do it. Other options include:

  • sending the tar file with the job submission
  • installing SLIC on nodes that are visible to the grid worker nodes

Prerequisites for sending jobs to the GRID

  1. get a DOE grid certificate from http://security.fnal.gov/pki/Get-Personal-DOEGrids-Cert.html
    This page also explains how to export the certificate from the browser and how to deal with directory permissions and such.
  2. register to the ILC VO (Virtual organization) at http://cd-amr.fnal.gov/ilc/ilcsim/ilcvo-registration.shtml that will guide you to:
    https://voms.fnal.gov:8443/vomrs/ilc/vomrs
  3. Everything is set up  on ILCSIM. So to try things out it is recommended to get an account on ILCSIM using the following form
    http://cd-amr.fnal.gov/ilc/ilcsim/ilcsim.shtml

Setting up your own gateway to the grid is beyond the scope of this write up. It involves installing the Virtual Data Toolkit (VDT) ,  you'll need a host certificate for the gateway  machine  etc. etc.  For an administration guide see the Fermi Grid web page.

Setting up the Environment

To set up the environment and to get the necessary grid Proxy log into ILCSIM and issue the following commands:

source /fnal/ups/grid/setup.sh
voms-proxy-init -voms ilc:/ilc/detector
# give passwd etc.

To check the status of the proxy:

voms-proxy-info -all

Submitting the first example Jobs

Now you should be all setup to submit a first trivial test job just to make sure that everything is working.  Just cut and paste the following lines into your terminal window. This will submit a grid job which starts 5 separate processes. The processes will not do anything exciting but execute sleep for 10 seconds  before  they terminate. Since not output is created the  sleep_grid.out.$(Cluster).$(Process) and sleep_grid.err.$(Cluster).$(Process) should be empty.
(Note!: $(Cluster) represents the jobnumber and $(Process) represents the (5) process  numbers)
The condor log files are:   sleep_grid.log.\$(Cluster).\$(Process)

cat > sleep_grid << +EOF
universe = grid
type = gt2
globusscheduler = fngp-osg.fnal.gov/jobmanager-condor
executable = /bin/sleep
transfer_output = true
transfer_error = true
transfer_executable = true
log = sleep_grid.log.\$(Cluster).\$(Process)
notification = NEVER
output = sleep_grid.out.\$(Cluster).\$(Process)
error = sleep_grid.err.\$(Cluster).\$(Process)
stream_output = false
stream_error = false
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
globusrsl = (jobtype=single)(maxwalltime=999)
Arguments = 10
queue 5
+EOF


condor_submit sleep_grid

 The second example is an exploration job where the job reports the run time environment it encounters and the file systems that are mounted. This is very often useful to find out what is available on the worker nodes (smile) . So have a look at  env_grid.out.$(Cluster).$(Process).

Note!: The grid job doesn't inherit the run time environment from your interactive session!

rm -f env_grid.sh
cat > env_grid.sh << +EOF
#!/bin/sh -f
printenv
cd \${_CONDOR_SCRATCH_DIR}
pwd
#
# This sets up the environment for osg in case we want to
# use grid services like srmcp
#
. $OSG_GRID/setup.sh
source \${VDT_LOCATION}/setup.sh
printenv
/bin/df
+EOF
chmod +x env_grid.sh

rm -f env_grid.run
cat > env_grid.run << +EOF
universe = grid
type = gt2
globusscheduler = fngp-osg.fnal.gov/jobmanager-condor
executable = ./env_grid.sh
transfer_output = true
transfer_error = true
transfer_executable = true
log = env_grid.log.\$(Cluster).\$(Process)
notification = NEVER
output = env_grid.out.\$(Cluster).\$(Process)
error = env_grid.err.\$(Cluster).\$(Process)
stream_output = false
stream_error = false
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
globusrsl = (jobtype=single)(maxwalltime=999)
queue
+EOF

condor_submit env_grid.run

Submitting a Job  running SLIC

Now finally let's run SLIC using the installation and a data set which available on the GRID worker nodes. As in the previous examples cut and paste the contends below:

rm -f slic_grid.csh
cat > slic_grid.csh << +EOF
#!/bin/csh
echo start
/bin/date
setenv LABELRUN DDyn4NMvCav-\${ClusterProcess}
setenv TARFILE $LABELRUN-results.tar
echo start
/bin/date
cd \${_CONDOR_SCRATCH_DIR}
mkdir results
/grid/app/ilc/detector/SimDist/Oct-31-2007/SimDist/scripts/slic.sh -r 5 \
-g /grid/app/ilc/detector/SimDist/detectors/sid01/sid01.lcdd            \
-i /grid/data/ilc/detector/LDC/stdhep/ZZ_run10.stdhep -o ./results/ZZ_run10\${LABELRUN} >& \
./results/ZZ_run10\${LABELRUN}.lis
ls results
/bin/date
echo "build output tarball: "$TARFILE
tar -cf $TARFILE *.txt results
echo done
+EOF
chmod +x slic_grid.csh

rm -f slic_grid.run
cat > slic_grid.run << +EOF
universe = grid
type = gt2
globusscheduler = fngp-osg.fnal.gov/jobmanager-condor
executable = ./slic_grid.csh
transfer_output = true
transfer_error = true
transfer_executable = true
transfer_output_files = DDyn4NMvCav-$(Cluster)-$(Process)-results.tar
log = slic_grid.log.\$(Cluster).\$(Process)
notification = NEVER
output = slic_grid.out.\$(Cluster).\$(Process)
error = slic_grid.err.\$(Cluster).\$(Process)
stream_output = false
stream_error = false
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
globusrsl = (jobtype=single)(maxwalltime=999)
queue
+EOF

condor_submit slic_grid.run

Running commands directly on the head node

To run some commands directly on the grid head nodes use a syntax like this:

globus-job-run fngp-osg.fnal.gov/jobmanager-condor /bin/ls /grid/app
globus-job-run fngp-osg.fnal.gov/jobmanager-condor /usr/bin/printenv
globus-job-run fngp-osg.fnal.gov/jobmanager-condor /bin/df

The examples above show how to check what grid applications are installed, the runtime environment of a job and what file systems are mounted. To check for available SLIC/SimDist distributions type:

globus-job-run fngp-osg.fnal.gov/jobmanager-condor /bin/ls /grid/app/ilc/detector/SimDist/

Checking and killing your jobs

You can see the status of all jobs using the following command:

condor_q
or
condor_q -globus

Or to check the jobs submitted by user <username>:

condor_q  -submitter <username>

You can view information about all requests with the following command:

condor_status  -submitters

To cancel a job type condor_rm followed by the job number:

condor_rm <job number>
  • No labels