See also the Workbook version of the pipeline II users guide

Introduction

Terminology

Task

A top level definition of work to be performed by the pipeline. A task may consist of one or more processes, and zero or more nested subtasks. Tasks are defined by an XML file.

Stream

A stream represents a single request to run the processes within a task. Streams always have an associated stream number which must be unique within each task. The stream number is always set at create stream time, either explicitly by the user or impl.

Sub-Task

A task contained within a parent task.

Sub-Stream

A stream corresponding to a sub-task.

Process

A single step within a task or subtask. Each process must be either a script or a job.

Job

A process which results in a batch job being run.

Script

A process which results in a script being run inside the pipeline server itself. These small scripts are typically used to perform simple calculations, set variables, create subtasks, or make entries in the data catalog. Scripts can call functions provided by the pipeline itself, as well as additional functions for adding entries to the data catalog.

Variables

Pipeline variables can be defined in a pipeline XML file, either at the task level, or at the level of individual processes. They can also be defined at create stream time. Processes inherit variables from

The task which contains them
Any parent task of the task which contains them
Any variables defined at create stream time.

Variables set by any process instance on which they depend (recursively)

Variables from other processes or tasks can also be accessed using the pipeline object.

Web Interface

XML Schema

The full pipeline II schema is available at: http://glast-ground.slac.stanford.edu/Pipeline-II/schemas/2.0/pipeline.xsd
You can view the XML for any task by clicking on the XML link on the task page, for example: http://glast-ground.slac.stanford.edu/Pipeline-II/xml.jsp?task=22980
The maven generated pipeline II web site contains full documentation on the XML schema automatically generated from the source.

When editing an XML file for the pipeline, you are encouraged to use an editor which can validate XML files against XML schema, since this will save you a lot of time. EMACS users may be interested in this guide to using XML with EMACS.

Batch Jobs

Batch jobs will always have the following environment variables set:

Variable	Usage
PIPELINE_PROCESSINSTANCE	The internal database id of the process instance
PIPELINE_STREAM	The stream number
PIPELINE_STREAMPATH	The stream path. For a top level task this will be the same as the stream number, for sub-tasks this will be of the form i.j.k
PIPELINE_TASK	The task name
PIPELINE_PROCESS	The process name

Command Line Tools

To get details on using the pipeline II client try

~glast/pipeline-II/pipeline help

Which currently gives:

Syntax:

   pipeline <command> <args>

where command is one of:

   createStream <task> <stream> <env>

      where <task>   is the name of the task to create (including optional
                     version number)
            <stream> is the stream number to create.
            <env>    are environment variables to pass in, of the form
                     var=value\{,var=value...\}

Example

~glast/pipeline-II/pipeline createStream CHS-level1 2
    "downlinkID=060630001,numChunks=10,productVer=0,fastCopy=0"

Pipeline Objects

The "pipeline" Object

The pipeline object provides an entrypoint for communicating with the pipeline server in script processes. Below is a summary of the functionality currently available.

pipeline API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])

Registers a new Dataset entry with the Data Catalog.

dataType is a character string specifying the type of data contained in the file. Examples include MC, DIGI, RECON, MERIT. This is an enumerated field, and must be pre-registered in the database. A Pipeline-II developer can add additional values upon request.

Note: Maximum length is 20 characters.

The "datacatalog" Object

The datacatalog object provides an entrypoint for communicating with the datacatalog service in script processes. Below is a summary of the functionality currently available.

datacatalog API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])

Registers a new Dataset entry with the Data Catalog.

dataType is a character string specifying the type of data contained in the file. Examples include MC, DIGI, RECON, MERIT. This is an enumerated field, and must be pre-registered in the database. A Pipeline-II developer can add additional values upon request.

Note: Maximum length is 20 characters.

logicalPath is a character string representing the location of the dataset in the virtual directory structure of the Data Catalog. This parameter contains three fields: the "folder", (optional) "group", and the dataset "name". The encoding is "/path/to/folder/group:name" -- if the optional group specification is ommited, the encoding is "/path/to/folder/name".

Example: /ServiceChallenge/Background/1Week/MC:000001 represesnts a dataset named "000001" stored in a group named "MC" within the folder "/ServiceChallenge/Background/1Week/".

Example: /ServiceChallenge/Background/1Week/000001 represents a dataset named "000001" stored directly within the folder "/ServiceChallenge/Background/1Week/".

Note: Maximum length is 50 characters for each subdirectory name and 50 characters for group name and 50 characters for dataset name.

filePath is a character string representing the physical location of the file. This parameter contains two fields: the "file path on disk" and the (optional) "site" of the disk cluster. The encoding is "/path/to/file@SITE". The default site is "SLAC".

Example: /nfs/farm/g/glast/u34/ServiceChallenge/Background/1Week/Simulation/1Week/AllGamma/000001.MC@SLAC

Note: Maximum file-path length is 256 characters, maximum site length is 20 characters.

attributes [optional] is a colon-delimited character string specifying additional attributes with which to tag the file. The encoding is "a=1:b=apple:c=23.6". All attribute values are stored in the database as ascii text. No expression evaluation is performed.

Example: mcTreeVer=v7r3p2:meanEnergy=850MeV

Note: Maximum length is 20 characters for attribute name and 256 characters for attribute value.

Space shortcuts

Child pages

Introduction

Terminology

Task

Stream

Sub-Task

Sub-Stream

Process

Job

Script

Variables

Web Interface

XML Schema

Batch Jobs

Command Line Tools

Example

Pipeline Objects

The "pipeline" Object

pipeline API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])

The "datacatalog" Object

datacatalog API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])

Space shortcuts

Child pages

Pipeline II User's Guide

Introduction

Terminology

Task

Stream

Sub-Task

Sub-Stream

Process

Job

Script

Variables

Web Interface

XML Schema

Batch Jobs

Command Line Tools

Example

Pipeline Objects

The "pipeline" Object

pipeline API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])

The "datacatalog" Object

datacatalog API

registerDataset(String dataType, String logicalPath, String filePath[, String attributes])