You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 48 Next »

Processing Data in Batch Mode using LCSim XML

Basics

If you have not gotten here by following the LCSim Tutorials, then you might want to backup and review, as necessary.

This tutorial explains how to run org.lcsim in a batch computing environment like a Unix command line or from a shell script that could be run on the Grid or your local batch computing system. The user provides what is typically called a "steering" file in HEP. It specifies all the parameters of the batch job. These steering files may have the extension .xml, but it is recommended to use .lcsim instead, to avoid ambiguity with other markup formats.

Setup

Follow the instructions for building lcsim software using maven2.

You can now run lcsim from the command-line using the java command.

java -server -jar ./target/lcsim-[VERSION]-bin.jar [XML]

The VERSION is replaced by your lcsim build version. The XML argument points to a local file in the lcsim recon XML format.

For example...

java -server -jar ./target/lcsim-1.11-SNAPSHOT-bin.jar ./myJob.lcsim

Simple Example

The JobManager class processes your job, which is written in an xml format.

Here is a simple example which will print the event number.

<lcsim>
    <inputFiles>
        <file>./myEvents.slcio</file>
    </inputFiles>
    <control>
        <numberOfEvents>100</numberOfEvents>
    </control>
    <execute>
        <driver name="EventMarkerDriver"/>
    </execute>
    <drivers>
        <driver name="EventMarkerDriver"
                type="org.lcsim.job.EventMarkerDriver">
            <eventInterval>1</eventInterval>
        </driver>
    </drivers>
</lcsim>

The inputFiles section is a list of one or more LCIO input file paths that will be processed.

The control section sets the jobs run parameters. Here we set the maximum numberOfEvents.

The execute section is a list of drivers to be executed in order. The name field of the driver element must correspond with a valid driver.

Finally, the drivers section describes the drivers that will be run on the input file. Certain types of Driver parameters can be set in this section. Here the interval for event printing is set as eventInterval, which is an integer.

The signature for this method looks like this.

public void setEventInterval(int eventInterval);

The JobManager is able to convert from xml to these simple setters using Javabeans.

LCSim XML Format

This shows all possible XML elements in the LCSim format.

<lcsim>
    <inputFiles>
        <fileUrl />
        <file />
        <fileSet>
            <file />
        </fileSet>
        <fileList />
        <fileUrlList />
    </inputFiles>
    <control>
        <dryRun />
        <logFile />
        <cacheDirectory />
        <skipEvents />
        <numberOfEvents />
        <verbose />
        <printDriverStatistics />
        <printSystemProperties />
        <printUserClassPath />
        <printDriversDetailed />
    </control>
    <classpath>
        <jarUrl />
        <jar />
    </classpath>
    <define>
        <anExampleVariable/>
    </define>
    <execute>
        <driver name="ExampleDriver" />
    </execute>
    <drivers>
        <driver name="ExampleDriver" type="org.lcsim.example.ExampleDriver">
            <exampleParam />
        </driver>
    </drivers>
</lcsim>

Each of these xml sections will be explained in greater detail below.

Input Files

The <inputFiles> section contains a list of local or remote files to be processed.

These can be <file> elements which contain a relative or absolute path to a file on the local file system.

<inputFiles>
    <file>/path/to/local/datafile.slcio</file>
</inputFiles>

Remote files that accessible via a public URL can be accessed using a <fileUrl> element.

<inputFiles>
    <fileUrl>ftp://example.org/datafile.slcio</fileUrl>
</inputFiles>

These remote files will be downloaded to the cache directory, which is ~/.cache, by default. A different local cache directory can be specified using the <cacheDirectory> tag (covered below).

The <inputFiles> section can contain a mixture of <file> and <fileUrl> objects.

Some batch systems may not support remote file access via URL. Check with your administrator.

Job Control

The <control> section contains parameters that control the batch job, including the number of events to run and whether various debugging output should be printed.

Setting <dryRun> to true means that the job manager will create the drivers but will not run the job. This can be used to check that your driver setup and arguments are correct.

The <logFile> element is used to specify a log file location. If no log file is specified, the job output goes to the terminal screen.

The <cacheDirectory> specifies the root directory to be used for caching remote data files.

The <numberOfEvents> is the total number of events that will be run. All events will be processed if this argument is left blank.

The <skipEvents> argument tells the job manager to skip a number of events up-front before processing the rest.

The <verbose> tag should be set to true for verbose debugging output.

These tags can also be set to true to print out additional information about the job: <printDriverStatistics>, <printSystemProperties>, <printUserClassPath>, and <printDriversDetailed>.

Variable Definitions

The job manager has very limited support for "free" variable definitions, using the <define> block.

At the moment, this is limited to single doubles, which can include expressions to be evaluated.

Here is an example of a simple double parameter.

<define>
    <aDoubleParam>1.1</aDoubleParam>
</define>

Variables defined here can be included in expressions by using their name.

<define>
    <aDoubleParam1>1.1</aDoubleParam1>
    <aDoubleParam2>2.2</aDoubleParam2>
    <aDoubleParam3>aDoubleParam1 + aDoubleParam2</aDoubleParam3>
</define>

Driver Execution

The <execute> section specifies the order in which the drivers will be called for each event. Each <driver> tag must have a unique name attribute value that matches the name of a driver defined in the <drivers> section (see next section).

Driver Definition

The <drivers> section contains definitions for all drivers that will be called in the job. These drivers need to be defined in the LCSim package jar or any of the jars in the <classpath>.

Driver Arguments

Using Javabeans, the job manager is able to convert simple XML text into parameter arguments for drivers. Only simple method signatures with single arguments are supported.

Here is a table of supported parameter types.

type

array1d

array2d

expression

int

yes

yes

yes

String

yes

no

no

double

yes

yes

yes

float

yes

no

yes

boolean

yes

no

no

Hep3Vector

no

no

no

File

no

no

no

URL

no

no

no

Types with a yes in the array1d or array2d columns support arrays of those dimensions. Arrays beyond two dimensions are not supported and would need to be read in manually by user code, perhaps using a method with a File or URL argument. Types that support expression evaluation have a yes in that column.

Driver Example

The easiest way to understand how the driver parameter conversion works is to study a simple example.

Here is an example Driver class.

package org.lcsim.example;

public class MyDriver
{
    public void setX(int x);
    public void setX1(int[] x);
    public void setX2(int[][] x);
  
    public void setFile(File f);
    public void setUrl(URL url);
    public void setVector(Hep3Vector vec);
}

In a real Driver, the methods would be defined to set private variables to the argument values, but this is left out of the example for brevity.

This is the corresponding XML code in <drivers> that would set values for these parameters.

<driver name="MyDriver" type="org.lcsim.example.MyDriver">
    <x>1</x>
    <x1>1 2 3</x1>
    <x2>1 2 3; 4 5 6</x2>

    <file>/path/to/a/file.txt</file>
    <url>http://example.org/file.txt</url>
    <vector>1.0 2.0 3.0</vector>
</driver>

There are several important things to notice in this example.

By Javabeans' convention, the set methods are transformed into parameter names by removing the "set" string from the method name and making the first letter of the parameter lower case. Driver methods must begin with "set" or will not be accessible from LCSim XML.

Multi-dimensional arguments are space delimited, meaning String arguments should not have spaces. The rows in 2D arrays are separated by semicolons.

In the above example, integers are used for the 1D and 2D arrays, but other types support arrays, also. See the types table above for specifics.

Expression Evaluation

Simple expression evaluation is supported for a limited set of the supported parameter types, including int, double, and float plus 1D or 2D arrays of these types. Supported symbols include *, /, +, (, ), and -, which have their usual mathematical meaning, plus trig functions like sin and cos. Variables created in <define> can also be accessed by their name. Expressions may have units, also. (see next section)

The GNU JEL library provides this capability. Refer to its documentation for further information.

Units

LCSim supports the named units defined by CLHEP's SystemOfUnits.

The names of the units are the same, but the actual values may not be the same. For instance, in LCSim, the basic energy unit is GeV, whereas it is MeV in CLHEP.

Refer to the LCsim SystemOfUnits documentation to see which units are defined.

Guidelines for Creating Compatible Drivers

Drivers created for use in an LCSim XML file should follow these guidelines.

  • The class must be public.
  • It must have a public constructor that takes no arguments.
  • The driver should not do initialization in the constructor but should use the detectorChanged() or startOfData() methods, instead.
  • The driver's set methods should always be of the form
    public void set[ParameterName]([type] [varName])
    Set methods not of this form will not be accessible as XML parameters.
  • The use of sub-drivers is discouraged due to these being inaccessible to the job manager, though it is still possible to use them.

Running a Specific LCSim Release

How to Run a Release

Running your job with the lcsim jar is straightforward. Download the bin jar from the repository and use the java command to execute your steering file.

wget http://www.lcsim.org/maven2/org/lcsim/lcsim/1.14-SNAPSHOT/lcsim-1.14-SNAPSHOT-bin.jar
java -jar ./lcsim-1.14-SNAPSHOT-bin.jar mySteeringFile.xml

The steering file must be provided by the user.

  • No labels