Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

--stream <Stream ID=-1>

Integer stream identifier. Auto assigned if option not specified.

--nStreams <Number of Streams=1>

Number of streams to create, not valid if Stream ID specified

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c956a0fb5439cd1f-aa52dab6-4ef94613-8fc2a0a1-227e862a3a47676c2a3db9d1"><ac:plain-text-body><![CDATA[

--define <name=value>

Define a variable. Syntax is "name=value[,name2=value2,...]"

]]></ac:plain-text-body></ac:structured-macro>

...

Code Block
xml
xml
<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline" 
          xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
          xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline 
          http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
  <task name="EXOMCBackground" type="EXO" version="1.311">
    <notation>A generic task for running EXO MC backgrounds</notation> 
    <variables>
      <var name="EXODIR">/nfs/slac/g/exo</var>
      <var name="EXOBASE">${EXODIR}/software/builds/trunk</var>
      <var name="BATCHOPTIONS">-R &quot;select[-rhel30] rusage[scratch=1]&quot;</var>
      <var name="CORE_LIMIT">1024</var>
      <var name="MAXEVENTS">10000</var> 
      <var name="PRINTMODULO">${MAXEVENTS/100}</var>
      <var name="INITIALSEED">pipeline.stream%100000</var>
      <var name="MAXCPU">${MAXEVENTS/10}</var>
      <var name="MAXMEM">1000</var>
      <var name="OUTPUTSOURCE_DIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</VOLUME">HFE</var>
      <var name="OUTPUTSOURCE_FORMAT">MC-background-%06d.root</var>ION">k</var>
      <var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
      <var name="OUTPUT_FILE">${OUTPUT_DIRDIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/TestBkgdMC/${SOURCE_ION}/${OUTPUTSOURCE_NAMEVOLUME}</var>
      <var name="DATACATOUTPUT_DIR">EXO/Test<FORMAT">MC-background-%06d.root</var>
      <var name="DATACATOUTPUT_GROUP">MyGroup<NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
    </variables>

    <var  <process name="runMonteCarlo">OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</var>
      <var   <job batchOptionsname="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}">DATACAT_DIR">EXO/TestBkgdMC/${SOURCE_ION}</var>
         ...
         </job>
  <var name="DATACAT_GROUP">${SOURCE_VOLUME}</var>
    </process>variables>


      <process name="register-dsrunMonteCarlo">
         <notation>Register datasets created in this task</notation>
   <job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}">
      <script>
         ...
         </script>job>
         <depends></process>


              <after process="runMonteCarlo"/<process name="register-ds">
         </depends>
<notation>Register datasets created in this task</notation>
         <script>
          </process>   ...
         </task>
</pipeline>
script>

         <depends>
              <after process="runMonteCarlo"/>
         </depends>

     </process>
  </task>
</pipeline>

In this file the <task> element defines the name and version In this file the <task> element defines the name and version # of the task, as well as the task type. <notation> just indicates a comment describing the task. The <variables> section defines a set of variables which will be used elsewhere in the task. The values given to the variables are defaults which can be overridden for any specific stream when the stream is created. Note that variables can be defined in terms of other variables by using ${expression} syntax.

...

Code Block
none
none
ulimit -c ${CORE_LIMIT} # Limit core dumps
set -e # exit on error

# Create a scratch area to write the output to
export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE}
mkdir -p ${SCRATCH_DIR}
gotEXIT()
{
   rm -rf ${SCRATCH_DIR}   
}
trap gotEXIT EXIT

source ${EXOBASE}/setup.sh

# Create background.exe
cat > background.exo <<EOF
use exosim rec toutput
/exosim/macro background.mac
/exosim/filter true
printmodulo ${PRINTMODULO}
/exosim/initial_seed ${INITIALSEED}
/exosim/run_number ${PIPELINE_STREAM}
maxevents ${MAXEVENTS}
/toutput/file ${SCRATCH_DIR}/output.root
begin
exit
EOF

cat > background.mac <<EOF
/digitizer/wireNoise 800.000000
/digitizer/APDNoise 2000.000000
/digitizer/LXeEnergyRes 0.015000
/event/LXeEventsOnly true
/event/digitizeWires true
/event/digitizeAPDs true
/gps/pos/type Volume
/gps/pos/shape Cylinder
/gps/pos/halfz 72.5 cm
/gps/pos/radius 75.0 cm
/gps/pos/centre 0.0 0.0 0.0 cm
/gps/pos/confine HFE
/gps/energy 0 keV
/gps/particle ion
/gps/ion 19 40 0 0
/grdm/analogueMC 1
EOF

EXOAnalysis background.exo

mkdir -p ${OUTPUT_DIR}
cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}

There are a few points worth noting

  • All of the variables defined earlier in the task are passed to the batch job as environment variables and ca be referred to using the bash ${VARIABLE} syntax.
  • The bash file creates a scratch folder at the top of the job, and registers a cleanup trap to delete the scratch area at the end of the job. The output data is written to the scratch area and copied to its final location if the job completes successfully. This is recommended practice since if many batch jobs write simultaneously to the same NFS file server it will likely become overloaded and fail horribly.
  • The .mac file and .exo file required for running EXOAnalysis are generated on the fly, substituting in settings from the variables defined earlier. Note also the use of some special pipeline variables:
    • ${PIPELINE_STREAM} -- an id assigned when each stream is created. The id will be unique within this task and normally starts at 0 and increments for each subsequent stream
    • ${PIPELINE_PROCESSINSTANCE} -- similar to pipeline stream this is a unique id associated with a stream, but this id is unique across all tasks, and is normally a large ugly number

Finally lets look at the body of the scriptlet which is used to register the output dataset. The scriptlet is written in python:

...


attributes = {'sCreator':'tonyj','nThings':0.1}
dsNew = datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE)
datacatalog.registerDataset(dsNew, attributes);
case ${SOURCE_ION} in
k)
GPS_ION="19 40 0 0"
;;
th)
GPS_ION="90 232 0 0"
;;
u)
GPS_ION="92 238 0 0"
;;
*)
echo "Unknown ION ${SOURCE_ION}"
exit 1
esac

case ${SOURCE_VOLUME} in
HFE)
HALFZ=72.5
RADIUS=75.0
;;
InnnerCryo)
HALFZ=74.5
RADIUS=78.0
;;
*)
echo "Unknown volume ${SOURCE_VOLUME}"
exit 1
esac

# Create background.mac
cat > background.mac <<EOF
/digitizer/wireNoise 800.000000
/digitizer/APDNoise 2000.000000 
/digitizer/LXeEnergyRes 0.015000
/event/LXeEventsOnly true
/event/digitizeWires true
/event/digitizeAPDs true
/gps/pos/type Volume
/gps/pos/shape Cylinder
/gps/pos/halfz ${HALFZ} cm
/gps/pos/radius ${RADIUS} cm
/gps/pos/centre 0.0 0.0 0.0 cm
/gps/pos/confine ${SOURCE_VOLUME}
/gps/energy 0 keV
/gps/particle ion
/gps/ion ${GPS_ION}
/grdm/analogueMC 1
EOF

EXOAnalysis background.exo

mkdir -p ${OUTPUT_DIR}
cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}

There are a few points worth noting

  • All of the variables defined earlier in the task are passed to the batch job as environment variables and ca be referred to using the bash ${VARIABLE} syntax.
  • The bash file creates a scratch folder at the top of the job, and registers a cleanup trap to delete the scratch area at the end of the job. The output data is written to the scratch area and copied to its final location if the job completes successfully. This is recommended practice since if many batch jobs write simultaneously to the same NFS file server it will likely become overloaded and fail horribly.
  • The .mac file and .exo file required for running EXOAnalysis are generated on the fly, substituting in settings from the variables defined earlier. Note also the use of some special pipeline variables:
    • ${PIPELINE_STREAM} -- an id assigned when each stream is created. The id will be unique within this task and normally starts at 0 and increments for each subsequent stream
    • ${PIPELINE_PROCESSINSTANCE} -- similar to pipeline stream this is a unique id associated with a stream, but this id is unique across all tasks, and is normally a large ugly number

Finally lets look at the body of the scriptlet which is used to register the output dataset. The scriptlet is written in python:

Code Block
none
none

metaData = {'nGeneratedEvents':MAXEVENTS,'SourceVolume':SOURCE_VOLUME,'SourceIon':SOURCE_ION}
dsNew = datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE)
datacatalog.registerDataset(dsNew, metaData)

Again a few things worth noting:

  • The datacatalog allows arbitrary meta-data to be associated with datasets. In this case the meta-data is defined as a python dictionary
  • The newDataset method has many arguments which need some explanation.

Putting everything together the full XML file for the task is:

Code Block
xml
xml
titleEXOMCBackground.xml

<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline" 
          xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
          xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline 
          http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
  <task name="EXOMCBackground" type="EXO" version="1.11">
    <notation>A generic task for running EXO MC backgrounds</notation> 
    <variables>
      <var name="EXODIR">/nfs/slac/g/exo</var>
      <var name="EXOBASE">${EXODIR}/software/builds/trunk</var>
      <var name="BATCHOPTIONS">-R &quot;select[-rhel30] rusage[scratch=1]&quot;</var>
      <var name="CORE_LIMIT">1024</var>
      <var name="MAXEVENTS">10000</var> 
      <var name="PRINTMODULO">${MAXEVENTS/100}</var>
      <var name="INITIALSEED">pipeline.stream%100000</var>
      <var name="MAXCPU">${MAXEVENTS/10}</var>
      <var name="MAXMEM">1000</var>
      <var name="SOURCE_VOLUME">HFE</var>
      <var name="SOURCE_ION">k</var>
      <var name="OUTPUT_DIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/TestBkgdMC/${SOURCE_ION}/${SOURCE_VOLUME}</var>
      <var name="OUTPUT_FORMAT">MC-background-%06d.root</var>
      <var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
      <var name="OUTPUT_FILE">${OUTPUT_DIR}/${OUTPUT_NAME}</var>
      <var name="DATACAT_DIR">EXO/TestBkgdMC/${SOURCE_ION}</var>
      <var name="DATACAT_GROUP">${SOURCE_VOLUME}</var>
    </variables>

      <process name="runMonteCarlo">
         <job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"><![CDATA[
              ulimit -c ${CORE_LIMIT} # Limit core dumps
              set -e # exit on error

              # Create a scratch area to write the output to
              export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE}
              mkdir -p ${SCRATCH_DIR}
              gotEXIT()
              {
                 rm -rf ${SCRATCH_DIR}   
              }
              trap gotEXIT EXIT

              source ${EXOBASE}/setup.sh

              # Create background.exe
              cat > background.exo <<EOF
              use exosim rec toutput
    

Again a few things worth noting:

  • The datacatalog allows arbitrary meta-data to be associated with datasets. In this case the meta-data is defined as a python dictionary
  • The newDataset method has many arguments which need some explanation.

Putting everything together the full XML file for the task is:

Code Block
xmlxml
titleEXOMCBackground.xml

<?xml version="1.0" encoding="UTF-8"?>
<pipeline xmlns="http://glast-ground.slac.stanford.edu/pipeline" 
          xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
          xs:schemaLocation="http://glast-ground.slac.stanford.edu/pipeline exosim/macro background.mac
          http://srs.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd">
  <task name="EXOMCBackground" type="EXO" version="1.5">
 /exosim/filter true
       <notation>A generic task for running EXO MC backgrounds</notation> 
printmodulo ${PRINTMODULO}
       <variables>
      <var name="EXODIR">/nfs/slac/g/exo</var>
/exosim/initial_seed ${INITIALSEED}
       <var name="EXOBASE">${EXODIR}/software/builds/trunk</var>
      <var name="BATCHOPTIONS">-R &quot;select[-rhel30] rusage[scratch=1]&quot;</var>
       /exosim/run_number ${PIPELINE_STREAM}
            <var name="CORE_LIMIT">1024</var>  maxevents ${MAXEVENTS}
      <var name="MAXEVENTS">10000</var> 
      <var name="PRINTMODULO">${MAXEVENTS/100}</var>/toutput/file ${SCRATCH_DIR}/output.root
      <var name="INITIALSEED">pipeline.stream%100000</var>
      <var name="MAXCPU">${MAXEVENTS/10}</var>
  begin
            <var name="MAXMEM">1000</var>  exit
      <var name="OUTPUT_DIR">/nfs/slac/g/exo/exo_data/data/MC/backgrounds/test</var>
      <var name="OUTPUT_FORMAT">MC-background-%06d.root</var>
 EOF

       <var name="OUTPUT_NAME">${format(pipeline.stream,OUTPUT_FORMAT)}</var>
      <var name="OUTPUT_FILE">${OUTPUT_DIR}/case ${OUTPUTSOURCE_NAME}</var>
ION} in
          <var name="DATACAT_DIR">EXO/Test</var>
    k)
               <var nameGPS_ION="DATACAT_GROUP">MyGroup</var>19 40 0 0"
    </variables>

           <process name="runMonteCarlo"> ;;
         <job batchOptions="${BATCHOPTIONS}" maxCPU="${MAXCPU}" maxMemory="${MAXMEM}"><![CDATA[   th)
              ulimit -c ${CORE_LIMIT} # Limit core dumps
GPS_ION="90 232 0 0"
                ;;
     set -e # exit on error

    u)
          # Create a scratch area to write the output to GPS_ION="92 238 0 0"
              export SCRATCH_DIR=/scratch/exo/${PIPELINE_PROCESSINSTANCE}  ;;
              mkdir -p ${SCRATCH_DIR}
*)
                 gotEXIT()
  echo "Unknown ION ${SOURCE_ION}"
            {
    exit 1
            rm -rf ${SCRATCH_DIR} esac

   
           case   }${SOURCE_VOLUME} in
              trapHFE)
 gotEXIT EXIT

              source ${EXOBASE}/setup.sh
HALFZ=72.5
              cat > background.exo <<EOFRADIUS=75.0
              use exosim rec toutput;;
              /exosim/macro background.macInnnerCryo)
              /exosim/filter true
         HALFZ=74.5
      printmodulo ${PRINTMODULO}
         RADIUS=78.0
     /exosim/initial_seed ${INITIALSEED}
            ;;
  /exosim/run_number ${PIPELINE_STREAM}
           *)
   maxevents ${MAXEVENTS}
            echo  /toutput/file"Unknown volume ${SCRATCHSOURCE_DIR}/output.root
VOLUME}"
                exit begin1
              exitesac

              # Create EOFbackground.mac

              cat > background.mac <<EOF
              /digitizer/wireNoise 800.000000
              /digitizer/APDNoise 2000.000000 
              /digitizer/LXeEnergyRes 0.015000
              /event/LXeEventsOnly true
              /event/digitizeWires true
              /event/digitizeAPDs true
              /gps/pos/type Volume
              /gps/pos/shape Cylinder
              /gps/pos/halfz 72.5${HALFZ} cm
              /gps/pos/radius 75.0${RADIUS} cm
              /gps/pos/centre 0.0 0.0 0.0 cm
              /gps/pos/confine HFE${SOURCE_VOLUME}
              /gps/energy 0 keV
              /gps/particle ion
              /gps/ion 19 40 0 0${GPS_ION}
              /grdm/analogueMC 1
              EOF

              EXOAnalysis background.exo
              
              mkdir -p ${OUTPUT_DIR}
              cp -pv ${SCRATCH_DIR}/output.root ${OUTPUT_FILE}
              ]]>
         </job>
      </process>


      <process name="register-ds">
         <notation>Register datasets created in this task</notation>
         <script><![CDATA[ 
           attributesmetaData = {'sCreatornGeneratedEvents':MAXEVENTS,'tonyjSourceVolume':SOURCE_VOLUME,'nThingsSourceIon':0.1SOURCE_ION}
           dsNew = datacatalog.newDataset(OUTPUT_NAME, "root", "EXOROOT", DATACAT_DIR, DATACAT_GROUP, "SLAC", OUTPUT_FILE)
           datacatalog.registerDataset(dsNew, attributesmetaData);
           ]]>
         </script>

         <depends>
              <after process="runMonteCarlo"/>
         </depends>

     </process>
  </task>
</pipeline>

...