The GLAST "pipeline" is a software mechanism for organizing and executing massively parallel computing projects. Internally, the pipeline consists of a server application, web applications, Unix commands, and Oracle tables. Externally, the pipeline offers a general framework within which to organize and execute the desired data processing.
See also the Workbook version of the pipeline II users guide
Main organizational concepts of the pipeline include:
Task services offered by the pipeline include:
Operator services offered by the pipeline include:
Basic steps for using the pipeline:
When editing an XML file for the pipeline, you are encouraged to use an editor which can validate XML files against XML schema, since this will save you a lot of time. EMACS users may be interested in this guide to using XML with EMACS.
Batch jobs will always have the following environment variables set:
Variable |
Usage |
---|---|
PIPELINE_PROCESSINSTANCE |
The internal database id of the process instance |
PIPELINE_STREAM |
The stream number |
PIPELINE_STREAMPATH |
The stream path. For a top level task this will be the same as the stream number, for sub-tasks this will be of the form i.j.k |
PIPELINE_TASK |
The task name |
PIPELINE_PROCESS |
The process name |
To get details on using the pipeline II client try
~glast/pipeline-II/pipeline help
Which currently gives:
Syntax: pipeline <command> <args> where command is one of: createStream <task> <stream> <env> where <task> is the name of the task to create (including optional version number) <stream> is the stream number to create. <env> are environment variables to pass in, of the form var=value\{,var=value...\}
~glast/pipeline-II/pipeline createStream CHS-level1 2 "downlinkID=060630001,numChunks=10,productVer=0,fastCopy=0"
The pipeline object provides an entrypoint for communicating with the pipeline server in script processes. Below is a summary of the functionality currently available.
Please see the JavaDoc page for the pipeline java interface.
The datacatalog object provides an entrypoint for communicating with the datacatalog service in script processes. Below is a summary of the functionality currently available.
registerDataset(String dataType, String logicalPath, String filePath[, String attributes])
Registers a new Dataset entry with the Data Catalog.
dataType is a character string specifying the type of data contained in the file. Examples include MC, DIGI, RECON, MERIT. This is an enumerated field, and must be pre-registered in the database. A Pipeline-II developer can add additional values upon request.
Note: Maximum length is 20 characters.
logicalPath is a character string representing the location of the dataset in the virtual directory structure of the Data Catalog. This parameter contains three fields: the "folder", (optional) "group", and the dataset "name". The encoding is "/path/to/folder/group:name" -- if the optional group specification is ommited, the encoding is "/path/to/folder/name".
Example: /ServiceChallenge/Background/1Week/MC:000001 represents a dataset named "000001" stored in a group named "MC" within the folder "/ServiceChallenge/Background/1Week/".
Example: /ServiceChallenge/Background/1Week/000001 represents a dataset named "000001" stored directly within the folder "/ServiceChallenge/Background/1Week/".
Note: Maximum length is 50 characters for each subdirectory name and 50 characters for group name and 50 characters for dataset name.
filePath is a character string representing the physical location of the file. This parameter contains two fields: the "file path on disk" and the (optional) "site" of the disk cluster. The encoding is "/path/to/file@SITE". The default site is "SLAC".
Example: /nfs/farm/g/glast/u34/ServiceChallenge/Background/1Week/Simulation/1Week/AllGamma/000001.MC@SLAC
Note: Maximum file-path length is 256 characters, maximum site length is 20 characters.
attributes [optional] is a colon-delimited character string specifying additional attributes with which to tag the file. The encoding is "a=1:b=apple:c=23.6". All attribute values are stored in the database as ascii text. No expression evaluation is performed.
Example: mcTreeVer=v7r3p2:meanEnergy=850MeV
Note: Maximum length is 20 characters for attribute name and 256 characters for attribute value.