...
- runMonteCarlo – a batch job that is run to execute the simulation program and generate the output file
- register-ds – a scriptlet which is run if the batch job finishes successfully that registers the output file in the data catalog.
Typically when generating large MC event samples it is necessary to run many MC jobs each of which generates some number of events. In the pipeline this is achieved by creating many streams within a given task. In the case of this example Monte-Carlo task each stream will run one batch job followed by one registration scriplet.
Defining a task
To create a pipeline task it is necessary to write an XML configuration file. The key elements of the XML configuration file for the job above (with some details initially left out) are shown here:
Code Block | ||
---|---|---|
| ||
<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline
http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
<task name="EXOMCBackground" type="EXO" version="1.3">
<notation>A generic task for running EXO MC backgrounds</notation>
<variables>
<var name="EXODIR">/nfs/slac/g/exo</var>
<var name="EXOBASE">${EXODIR}/software/builds/trunk</var>
<var name="BATCHOPTIONS">-R "select[-rhel30] rusage[scratch=1]"</var>
<var name="CORE_LIMIT">1024</var>
<var name="MAXEVENTS">10000</var>
<var name="PRINTMODULO">${MAXEVENTS/100}</var>
<var name="MAXCPU">${MAXEVENTS/10}</var>
<var name="MAXMEM">1000</var>
<var name="OUTPUT_DIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</var>
<var name="OUTPUT_FORMAT">MC-background-%06d.root</var>
<var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
<var name="OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</var>
<var name="DATACAT_DIR">EXO/Test</var>
<var name="DATACAT_GROUP">MyGroup</var>
</variables>
<process name="runMonteCarlo">
<job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}">
...
</job>
</process>
<process name="register-ds">
<notation>Register datasets created in this task</notation>
<script>
...
</script>
<depends>
<after process="runMonteCarlo"/>
</depends>
</process>
</task>
</pipeline>
|
In this file the <task> element defines the name and version # of the task, as well as the task type. <notation> just indicates a comment describing the task. The <variables> section defines a set of variables which will be used elsewhere in the task. The values given to the variables are defaults which can be overridden for any specific stream.
The two steps of the task are each defined using a <process> element. The first <process> contains a <job> element indicating it is a batch job (the body of the job is not omitted for the moment). The second <process> contains a <script> element indicating that it is a scriptlet (again the body of the scriptlet is omitted for the time being. The <depends> element indicates that the scriptlet should only run after the batch job successfully completes.
Putting everything together the full XML file for the task is:
Code Block | ||
---|---|---|
| ||
<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline
http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
<task name="EXOMCBackground" type="EXO" version="1.3">
<notation>A generic task for running EXO MC backgrounds</notation>
<variables>
<var name="EXODIR">/nfs/slac/g/exo</var>
<var name="EXOBASE">${EXODIR}/software/builds/trunk</var>
<var name="BATCHOPTIONS">-R "select[-rhel30] rusage[scratch=1]"</var>
<var name="CORE_LIMIT">1024</var>
<var name="MAXEVENTS">10000</var>
<var name="PRINTMODULO">${MAXEVENTS/100}</var>
<var name="MAXCPU">${MAXEVENTS/10}</var>
<var name="MAXMEM">1000</var>
<var name="OUTPUT_DIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</var>
<var name="OUTPUT_FORMAT">MC-background-%06d.root</var>
<var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
<var name="OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</var>
<var name="DATACAT_DIR">EXO/Test</var>
<var name="DATACAT_GROUP">MyGroup</var>
</variables>
<process name="runMonteCarlo">
<job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"><![CDATA[
ulimit -c ${CORE_LIMIT} # Limit core dumps
set -e # exit on error
# Create a scratch area to write the output to
export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE}
mkdir -p ${SCRATCH_DIR}
gotEXIT()
{
rm -rf ${SCRATCH_DIR}
}
trap gotEXIT EXIT
source ${EXOBASE}/setup.sh
cat > background.exo <<EOF
use exosim rec toutput
/exosim/macro background.mac
/exosim/filter true
printmodulo ${PRINTMODULO}
/exosim/initial_seed ${PIPELINE_STREAM}
maxevents ${MAXEVENTS}
/toutput/file ${SCRATCH_DIR}/output.root
begin
exit
EOF
cat > background.mac <<EOF
/digitizer/wireNoise 800.000000
/digitizer/APDNoise 2000.000000
/digitizer/LXeEnergyRes 0.015000
/event/LXeEventsOnly true
/event/digitizeWires true
/event/digitizeAPDs true
/gps/pos/type Volume
/gps/pos/shape Cylinder
/gps/pos/halfz 72.5 cm
/gps/pos/radius 75.0 cm
/gps/pos/centre 0.0 0.0 0.0 cm
/gps/pos/confine HFE
/gps/energy 0 keV
/gps/particle ion
/gps/ion 19 40 0 0
/grdm/analogueMC 1
EOF
EXOAnalysis background.exo
mkdir -p ${OUTPUT_DIR}
cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}
]]>
</job>
</process>
<process name="register-ds">
<notation>
Register datasets created in this task
</notation>
<script><![CDATA[
from java.util import HashMap
from org.glast.datacat.client.sql import NewDataset
attributes = HashMap()
attributes.put('sCreator', 'tonyj')
dsNew = NewDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE)
datacatalog.registerDataset(dsNew, attributes);
]]>
</script>
<depends>
<after process="runMonteCarlo"></after>
</depends>
</process>
</task>
</pipeline>
|