...
To actually run some jobs it is necessary to call the createStream function of the pipeline. The easiest way to do this is to use the pipeline createStream command when logged in to SLAC unix (e.g. noric).
No Format |
---|
~exodata/pipeline/prod/pipeline createStream [-options] <taskname> [file1 [file2 [...]]...]
|
...
--stream <Stream ID=-1> | Integer stream identifier. Auto assigned if option not specified. | |
--nStreams <Number of Streams=1> | Number of streams to create, not valid if Stream ID specified | |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro -id="a132ab55-2ee9-4fb7-ada7-6f99d49ac244"><ac:plain-text-body><![CDATA[--define <name=value> | Define a variable. Syntax is "name=value[,name2=value2,...]" | ]]></ac:plain-text-body></ac:structured-macro> |
For example to create 10 streams of the For example to create 10 streams of the EXOMCBackground task defined above, overriding the default value of the MAXEVENTS variable we would use the following command:
No Format |
---|
~exodata/pipeline/prod/pipeline createStream ---define MAXEVENTS=100000 --nStreams 10 EXOMCBackground |
...
The pipeline web interface can be access from the EXO data portal at:
http://exo-data.slac.stanford.edu/
The web interface allows monitoring of the status of tasks and streams, viewing the log files of running or completed jobs and "rolling back" (rerunning) any failed jobs. It is also possible to view plots of how much CPU time jobs took, how much many jobs were running at a given time etc.
...
To create a pipeline task it is necessary to write an XML configuration file. The key elements of the XML configuration file for the task above (with some details initially left out) are shown here:
Code Block | ||||
---|---|---|---|---|
| ||||
<?xml version="1.0" encoding="UTF-8"?> <pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd"> <task name="EXOMCBackground" type="EXO" version="1.311"> <notation>A generic task for running EXO MC backgrounds</notation> <variables> <var name="EXODIR">/nfs/slac/g/exo</var> <var name="EXOBASE">${EXODIR}/software/builds/trunk</var> <var name="BATCHOPTIONS">-R "select[-rhel30] rusage[scratch=1]"</var> <var name="CORE_LIMIT">1024</var> <var name="MAXEVENTS">10000</var> <var name="PRINTMODULO">${MAXEVENTS/100}</var> <var name="MAXCPUINITIALSEED">${MAXEVENTS/10pipeline.stream%100000}</var> <var name="MAXMEM">1000<MAXCPU">${MAXEVENTS/10}</var> <var name="OUTPUT_DIRMAXMEM">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</>1000</var> <var name="OUTPUTSOURCE_FORMAT">MC-background-%06d.root<VOLUME">HFE</var> <var name="OUTPUTSOURCE_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}<ION">k</var> <var name="OUTPUT_FILE">${OUTPUT_DIRDIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/TestBkgdMC/${SOURCE_ION}/${OUTPUTSOURCE_NAMEVOLUME}</var> <var name="DATACATOUTPUT_DIR">EXO/Test<FORMAT">MC-background-%06d.root</var> <var name="DATACATOUTPUT_GROUP">MyGroup<NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var> <var name="OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</variables>var> <process<var name="runMonteCarlo">DATACAT_DIR">EXO/TestBkgdMC/${SOURCE_ION}</var> <var name="DATACAT_GROUP">${SOURCE_VOLUME}</var> </variables> <process name="runMonteCarlo"> <job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"> ... </job> </process> <process name="register-ds"> <notation>Register datasets created in this task</notation> <script> ... </script> <depends> <after process="runMonteCarlo"/> </depends> </process> </task> </pipeline> |
...
Now lets look at the parts which we initially missed out. First the body of the batch job, which by default is written as a bash script:
Code Block | ||||
---|---|---|---|---|
| ||||
ulimit -c ${CORE_LIMIT} # Limit core dumps set -e # exit on error # Create a scratch area to write the output to export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE} mkdir -p ${SCRATCH_DIR} gotEXIT() { rm -rf ${SCRATCH_DIR} } trap gotEXIT EXIT source ${EXOBASE}/setup.sh # Create background.exe cat > background.exo <<EOF use exosim rec toutput /exosim/macro background.mac /exosim/filter true printmodulo ${PRINTMODULO} /exosim/initial_seed ${INITIALSEED} /exosim/run_number ${PIPELINE_STREAM} maxevents ${MAXEVENTS} /toutput/file ${SCRATCH_DIR}/output.root begin exit EOF cat > background.mac <<EOF /digitizer/wireNoise 800.000000 /digitizer/APDNoise 2000.000000 /digitizer/LXeEnergyRes 0.015000 /event/LXeEventsOnly true /event/digitizeWires true /event/digitizeAPDs true /gps/pos/type Volume /gps/pos/shape Cylinder /gps/pos/halfz 72.5 cm /gps/pos/radius 75.0 cm /gps/pos/centre 0.0 0.0 0.0 cm /gps/pos/confine HFE /gps/energy 0 keV /gps/particle ion /gps/ion 19 40 0 0 /grdm/analogueMC 1 EOF EXOAnalysis background.exo mkdir -p ${OUTPUT_DIR} cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE} |
There are a few points worth noting
- All of the variables defined earlier in the task are passed to the batch job as environment variables and ca be referred to using the bash ${VARIABLE} syntax.
- The bash file creates a scratch folder at the top of the job, and registers a cleanup trap to delete the scratch area at the end of the job. The output data is written to the scratch area and copied to its final location if the job completes successfully. This is recommended practice since if many batch jobs write simultaneously to the same NFS file server it will likely become overloaded and fail horribly.
- The .mac file and .exo file required for running EXOAnalysis are generated on the fly, substituting in settings from the variables defined earlier. Note also the use of some special pipeline variables:
- ${PIPELINE_STREAM} -- an id assigned when each stream is created. The id will be unique within this task and normally starts at 0 and increments for each subsequent stream
- ${PIPELINE_PROCESSINSTANCE} -- similar to pipeline stream this is a unique id associated with a stream, but this id is unique across all tasks, and is normally a large ugly number
Finally lets look at the body of the scriptlet which is used to register the output dataset. The scriptlet is written in python:
case ${SOURCE_ION} in
k)
GPS_ION="19 40 0 0"
;;
th)
GPS_ION="90 232 0 0"
;;
u)
GPS_ION="92 238 0 0"
;;
*)
echo "Unknown ION ${SOURCE_ION}"
exit 1
esac
case ${SOURCE_VOLUME} in
HFE)
HALFZ=72.5
RADIUS=75.0
;;
InnnerCryo)
HALFZ=74.5
RADIUS=78.0
;;
*)
echo "Unknown volume ${SOURCE_VOLUME}"
exit 1
esac
# Create background.mac
cat > background.mac <<EOF
/digitizer/wireNoise 800.000000
/digitizer/APDNoise 2000.000000
/digitizer/LXeEnergyRes 0.015000
/event/LXeEventsOnly true
/event/digitizeWires true
/event/digitizeAPDs true
/gps/pos/type Volume
/gps/pos/shape Cylinder
/gps/pos/halfz ${HALFZ} cm
/gps/pos/radius ${RADIUS} cm
/gps/pos/centre 0.0 0.0 0.0 cm
/gps/pos/confine ${SOURCE_VOLUME}
/gps/energy 0 keV
/gps/particle ion
/gps/ion ${GPS_ION}
/grdm/analogueMC 1
EOF
EXOAnalysis background.exo
mkdir -p ${OUTPUT_DIR}
cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}
|
There are a few points worth noting
- All of the variables defined earlier in the task are passed to the batch job as environment variables and ca be referred to using the bash ${VARIABLE} syntax.
- The bash file creates a scratch folder at the top of the job, and registers a cleanup trap to delete the scratch area at the end of the job. The output data is written to the scratch area and copied to its final location if the job completes successfully. This is recommended practice since if many batch jobs write simultaneously to the same NFS file server it will likely become overloaded and fail horribly.
- The .mac file and .exo file required for running EXOAnalysis are generated on the fly, substituting in settings from the variables defined earlier. Note also the use of some special pipeline variables:
- ${PIPELINE_STREAM} -- an id assigned when each stream is created. The id will be unique within this task and normally starts at 0 and increments for each subsequent stream
- ${PIPELINE_PROCESSINSTANCE} -- similar to pipeline stream this is a unique id associated with a stream, but this id is unique across all tasks, and is normally a large ugly number
Finally lets look at the body of the scriptlet which is used to register the output dataset. The scriptlet is written in python:
Code Block | ||||
---|---|---|---|---|
| ||||
metaData = {'nGeneratedEvents':MAXEVENTS,'SourceVolume':SOURCE_VOLUME,'SourceIon':SOURCE_ION}
dsNew = datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", | ||||
Code Block | ||||
none | none | attributes = {'sCreator':'tonyj','nThings':0.1} dsNew = datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE) datacatalog.registerDataset(dsNew, attributesmetaData); |
Again a few things worth noting:
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<?xml version="1.0" encoding="UTF-8"?> <pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd"> <task name="EXOMCBackground" type="EXO" version="1.511"> <notation>A generic task for running EXO MC backgrounds</notation> <variables> <var name="EXODIR">/nfs/slac/g/exo</var> <var name="EXOBASE">${EXODIR}/software/builds/trunk</var> <var name="BATCHOPTIONS">-R "select[-rhel30] rusage[scratch=1]"</var> <var name="CORE_LIMIT">1024</var> <var name="MAXEVENTS">10000</var> <var name="PRINTMODULO">${MAXEVENTS/100}</var> <var name="INITIALSEED">pipeline.stream%100000<>${pipeline.stream%100000}</var> <var name="MAXCPU">${MAXEVENTS/10}</var> <var name="MAXMEM">1000</var> <var name="OUTPUTSOURCE_DIRVOLUME">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</>HFE</var> <var name="OUTPUTSOURCE_FORMAT">MC-background-%06d.root</ION">k</var> <var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var> <var name="OUTPUT_FILE">${OUTPUT_DIRDIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/TestBkgdMC/${SOURCE_ION}/${OUTPUTSOURCE_NAMEVOLUME}</var> <var name="DATACATOUTPUT_DIR">EXO/Test</FORMAT">MC-background-%06d.root</var> <var name="DATACATOUTPUT_GROUP">MyGroup<NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var> </variables> <var <process name="runMonteCarlo">OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</var> <var <job batchOptionsname="DATACAT_DIR">EXO/TestBkgdMC/${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"><![CDATA[SOURCE_ION}</var> <var name="DATACAT_GROUP">${SOURCE_VOLUME}</var> </variables> <process name="runMonteCarlo"> ulimit -c ${ <job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"><![CDATA[ ulimit -c ${CORE_LIMIT} # Limit core dumps set -e # exit on error # Create a scratch area to write the output to export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE} mkdir -p ${SCRATCH_DIR} gotEXIT() { rm -rf ${SCRATCH_DIR} } trap gotEXIT EXIT source ${EXOBASE}/setup.sh cat# >Create background.exo <<EOFexe use cat > background.exo <<EOF use exosim rec toutput /exosim/macro background.mac /exosim/filter true printmodulo ${PRINTMODULO} /exosim/initial_seed ${INITIALSEED} /exosim/run_number ${PIPELINE_STREAM} maxevents ${MAXEVENTS} /toutput/file ${SCRATCH_DIR}/output.root begin exit EOF cat > background.mac <<EOF case ${SOURCE_ION} in k) /digitizer/wireNoise 800.000000 /digitizer/APDNoise 2000.000000 GPS_ION="19 40 0 0" /digitizer/LXeEnergyRes 0.015000 ;; th) /event/LXeEventsOnly true GPS_ION="90 232 /event/digitizeWires true 0 0" /event/digitizeAPDs true ;; /gps/pos/type Volumeu) /gps/pos/shape Cylinder GPS_ION="92 238 0 0" /gps/pos/halfz 72.5 cm ;; /gps/pos/radius 75.0 cm*) /gps/pos/centre 0.0 0.0 0.0 cm echo "Unknown ION ${SOURCE_ION}" /gps/pos/confine HFE exit 1 /gps/energy 0 keV esac case /gps/particle ion${SOURCE_VOLUME} in /gps/ion 19 40 0 0 HFE) HALFZ=72.5 /grdm/analogueMC 1 EOF RADIUS=75.0 EXOAnalysis background.exo;; InnnerCryo) mkdir -p ${OUTPUT_DIR} HALFZ=74.5 cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE} RADIUS=78.0 ]]>;; </job> </process> *) <process name="register-ds"> <notation>Registerecho datasets"Unknown created in this task</notation> volume ${SOURCE_VOLUME}" <script><![CDATA[ exit 1 attributes = {'sCreator':'tonyj','nThings':0.1} esac dsNew# =Create datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE) background.mac cat > datacatalog.registerDataset(dsNew, attributes);background.mac <<EOF ]]> /digitizer/wireNoise 800.000000 </script> <depends>/digitizer/APDNoise 2000.000000 <after process="runMonteCarlo"/> /digitizer/LXeEnergyRes 0.015000 </depends> /event/LXeEventsOnly true /event/digitizeWires true </process> </task> </pipeline> |
Once a new task has been defined it can be uploaded using either the pipeline web interface (on the Admin page) or from the pipeline command on SLAC unix:
No Format |
---|
~exodata/pipeline/prod/pipeline load <xml-file>
|
...
/event/digitizeAPDs true
/gps/pos/type Volume
/gps/pos/shape Cylinder
/gps/pos/halfz ${HALFZ} cm
/gps/pos/radius ${RADIUS} cm
/gps/pos/centre 0.0 0.0 0.0 cm
/gps/pos/confine ${SOURCE_VOLUME}
/gps/energy 0 keV
/gps/particle ion
/gps/ion ${GPS_ION}
/grdm/analogueMC 1
EOF
EXOAnalysis background.exo
mkdir -p ${OUTPUT_DIR}
cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}
|
Once a new task has been defined it can be uploaded using either the pipeline web interface (on the Admin page) or from the pipeline command on SLAC unix:
No Format |
---|
~exodata/pipeline/prod/pipeline load <xml-file>
|
Note that each file to be uploaded must have a unique task name and version number, so when uploading new version of a task it is necessary to increment the version number in the <task> element. Once a task has been uploaded, an associated job will not start until you issue a "createStream" command (see Running Jobs section above).
List of available variables for EXOMCBackground (v1.12 and higher)
SOURCE_ION | Description |
---|---|
k | Potassium 40 |
th | Thorium-232 chain |
u | Uranium-238 chain |
rn_220 | Radon-220 chain |
rn_222 | Radon-222 chain |
SOURCE_VOLUME | Description |
---|---|
ActiveLXe | xenon volume |
LXeVessel | xenon vessel |
HFE | hfe |
InnerCryo | inner cryostat |
OuterCryo | outer cryostat |
LeadShield | lead |