Why use the pipeline to run Monte-Carlo jobs?
The pipeline offers a number of advantages when running large numbers of Monte-Carlo jobs:
- Provides an easy way to submit a large number of jobs
- Maintains record of all jobs run, including links to log files and other files produced during the job
- Makes it easy to rerun any jobs that fail due to quirks in the SLAC batch system.
- Makes it easy to register output datasets in the data catalog, which in turn makes it easy to keep track of what MC data is available.
- Provides a web interface to allow the status of jobs to be monitored from anywhere.
Example Monte-Carlo task
Normally to run a set of Monte-Carlo jobs it is necessary to define a pipeline "Task". A task consists of an arbitrary graph of batch jobs and "scriptlets" to be run, however a typical Monte-Carlo tasks consists of just two steps:
In this example the two steps are:
- runMonteCarlo – a batch job that is run to execute the simulation program and generate the output file
- register-ds – a scriptlet which is run if the batch job finishes successfully that registers the output file in the data catalog.