- Overview
- Prerequisites
- Setting up the Environment
- Submitting the first Example Jobs
- Submitting a Job running SLIC
- Running Commands directly on the Head Node
- Checking and Killing your Jobs
Overview
This is an example to run jobs and especially SLIC , the Simulator for the Linear Collider, on the FermiGrid which is part of the Open Science Grid. SLIC is a Geant4-based simulations package that uses an XML geometry input format called LCDD to describe geometry, sensitive detectors and readout geometry.
Note! The examples are written in a way that one can just cut and paste from your browser window to the terminal session on ILCSIM. Do not cut and paste into an editor (It wouldn't work unless you remove a lot of back slashes: \ ). Instead you can copy and edit the files that are created when cutting and pasting to the terminal window and modify them to to fit your own needs.
Prerequisites
...
- get a DOE grid certificate from http://security.fnal.gov/pki/Get-Personal-DOEGrids-Cert.html
This page also explains how to export the certificate from the browser and how to deal with directory permissions and such. - register to the ILC VO (Virtual organization) at http://cd-amr.fnal.gov/ilc/ilcsim/ilcvo-registration.shtml that will guide you to:
https://voms.fnal.gov:8443/vomrs/ilc/vomrs - Everything is set up on ILCSIM. So to try things out it is recommended to get an account on ILCSIM using the following form
http://cd-amr.fnal.gov/ilc/ilcsim/ilcsim.shtml
...
No Format |
---|
voms-proxy-info -all |
Submitting the first
...
Example Jobs
Now you should be all setup to submit a first trivial test job just to make sure that everything is working. Just cut and paste the following lines into your terminal window. This will submit a grid job which starts 5 separate processes. The processes will not do anything exciting but execute sleep for 10 seconds before they terminate. Since no output is created the sleep_grid.out.$(Cluster).$(Process) and sleep_grid.err.$(Cluster).$(Process) should be empty.
(Note!: $(Cluster) represents the job number and $(Process) represents the (5) process numbers)
The condor log files are: sleep_grid.log.\$(Cluster).\$(Process)
...
No Format |
---|
rm -f slic_grid.csh cat > slic_grid.csh << +EOF #!/bin/csh echo start /bin/date setenv LABELRUN slic_grid-\${ClusterProcess} setenv TARFILE \${LABELRUN}-results.tar echo \${TARFILE} echo start /bin/date mkdir results /grid/app/ilc/detector/SimDist/Oct-31-2007/SimDist/scripts/slic.sh -r 5 \ -g /grid/app/ilc/detector/SimDist/detectors/sid01/sid01.lcdd \ -i /grid/data/ilc/detector/LDC/stdhep/ZZ_run10.stdhep -o ./results/ZZ_run10\${LABELRUN} >& \ ./results/ZZ_run10\${LABELRUN}.lis ls -lh results /bin/date echo "build output tarball: " \${TARFILE} tar -cf \${TARFILE} results echo done +EOF chmod +x slic_grid.csh rm -f slic_grid.run cat > slic_grid.run << +EOF universe = grid type = gt2 globusscheduler = fngp-osg.fnal.gov/jobmanager-condor executable = ./slic_grid.csh transfer_output = true transfer_error = true transfer_executable = true environment = "ClusterProcess=\$(Cluster)-\$(Process)" transfer_output_files = slic_grid-\$(Cluster)-\$(Process)-results.tar log = slic_grid.log.\$(Cluster).\$(Process) notification = NEVER output = slic_grid.out.\$(Cluster).\$(Process) error = slic_grid.err.\$(Cluster).\$(Process) stream_output = false stream_error = false ShouldTransferFiles = YES WhenToTransferOutput = ON_EXIT globusrsl = (jobtype=single)(maxwalltime=999) queue +EOF condor_submit slic_grid.run |
Running
...
Commands directly on the
...
Head Node
To run some commands directly on the grid head nodes use a syntax like this:
...
No Format |
---|
globus-job-run fngp-osg.fnal.gov/jobmanager-condor /bin/ls /grid/app/ilc/detector/SimDist/ |
Checking and
...
Killing your jobs
You can see the status of all jobs using the following command:
...