Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Processing Data in Batch Mode using LCSim XML

Table of Contents

Overview

If you have not gotten here by following the LCSim Tutorials, then you might want to backup and review, as necessary.

...

Running from the Command Line

Follow the instructions for building lcsim software using maven2. This should result in a working lcsim setup Before starting you need to install org.lcsim on your local systemmachine.

You can now run lcsim from the command-line using the java command from the lcsim directory.

No Format

cd trunk/my/lcsim/dirdistribution # where is your lcsim?
java -server -jar ./target/lcsim-distribution-[VERSION]-bin.jar myjob.lcsim

...

The myjob.lcsim argument is an example name of a file in the lcsim reconstruction XML format.

Simple Job Example

Subsequently, in this documentation, the runnable jar will be referenced to as lcsim-distribution-bin.jar but the actual jar will have the version number in it.

LCSim Command Line Options

Running the jar without any arguments will print the usage instructionsHere is a simple example which will print the event number.

No Format

<lcsim xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
       xs:noNamespaceSchemaLocation="http://www.lcsim.org/schemas/lcsim/1.0/lcsim.xsd">
    <inputFiles>
        <file>./myEvents.slcio</file>
    </inputFiles>
    <control>
        <numberOfEvents>100</numberOfEvents>
    </control>
    <execute>
        <driver name="EventMarkerDriver"/>
    </execute>
    <drivers>
        <driver name="EventMarkerDriver"
                type="org.lcsim.job.EventMarkerDriver">
            <eventInterval>1</eventInterval>
        </driver>
    </drivers>
</lcsim>

The inputFiles section is a list of one or more LCIO input file paths that will be processed. There are actually multiple ways to specify input files (covered below).

The control section sets the jobs run parameters. Here we set the maximum numberOfEvents to 100.

The execute section is a list of drivers to be executed in order. The name field of the driver element must correspond with a valid driver.

Finally, the drivers section describes the drivers that will be run on the input file. Certain types of Driver parameters can be set in this section. Here the interval for event printing is set as eventInterval, which is an integer.

The signature for this Driver method looks like this.

No Format

public void setEventInterval(int eventInterval);

LCSim is able to convert from XML parameters to method calls on Drivers.

LCSim XML Format

The below pseudo-XML shows all possible elements in the LCSim format.

java -jar lcsim-distribution-bin.jar [options] steeringFile.xml
usage:
 -D    Define a variable with form [name]=[value]
 -n    Set the max number of events to process.
 -p    Load a properties file containing variable definitions
 -q    Turn on quiet mode.
 -s    Set the number of events to skip.
 -v    Turn on verbose mode
 -w    Rewrite the XML file with variables resolved
 -x    Perform a dry run which does not process events

These options should be mostly self-explanatory.

Variable Definitions

The LCSim XML format allows variables to be defined using the -D switch or within properties files specified by the -p option.

For instance, an LCIO input file could be defined using a variable.

No Format
<file>${inputFile}</file>

Then this file could be specified at the command line.

No Format
java -jar lcsim-distribution-bin.jar -DinputFile=myInputFile.slcio steeringFile.xml

This variable could also be set in a properties file.

No Format
java -jar lcsim-distribution-bin.jar -pmySettings.prop steeringFile.xml

The file mySettings.prop could contain the following.

No Format
inputFile=myInputFile.slcio

An unlimited number of definitions and properties files can be used.

Simple Job Example

Here is a simple example which will print the event number.

No Format
No Format

<lcsim xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
       xs:noNamespaceSchemaLocation="http://www.lcsim.org/schemas/lcsim/1.0/lcsim.xsd">
    <inputFiles>
        <fileUrl /><file>./myEvents.slcio</file>
    </inputFiles>
    <file /><control>
        <fileSet><numberOfEvents>100</numberOfEvents>
    </control>
    <execute>
    <file />
   <driver name="EventMarkerDriver"/>
    </fileSet>execute>
        <fileList /><drivers>
        <fileUrlList /><driver name="EventMarkerDriver"
    </inputFiles>
    <control>
        <dryRun>true</dryRun>type="org.lcsim.job.EventMarkerDriver">
        <logFile>/path/to/mylog.txt</logFile>
        <cacheDirectory>/path/to/mycache/</cacheDirectory><eventInterval>1</eventInterval>
        <skipEvents>1<</skipEvents>driver>
    </drivers>
</lcsim>

The inputFiles section is a list of one or more LCIO input file paths that will be processed. There are actually multiple ways to specify input files (covered below).

The control section sets the jobs run parameters. Here we set the maximum numberOfEvents to 100.

The execute section is a list of drivers to be executed in order. The name field of the driver element must correspond with a valid driver.

Finally, the drivers section describes the drivers that will be run on the input file. Certain types of Driver parameters can be set in this section. Here the interval for event printing is set as eventInterval, which is an integer.

The signature for this Driver method looks like this.

No Format
public void setEventInterval(int eventInterval);

LCSim is able to convert from XML parameters to method calls on Drivers.

LCSim XML Format

The pseudo-XML below shows all possible elements in the LCSim format.

No Format
<lcsim xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" 
       xs:noNamespaceSchemaLocation="http://www.lcsim.org/schemas/lcsim/1.0/lcsim.xsd">
    <inputFiles>
        <fileUrl />
        <file />
        <fileSet>
            <file />
        </fileSet>
        <fileList />
        <fileUrlList />
        <fileRegExp />
    </inputFiles>
    <control>
        <dryRun>true</dryRun>
        <logFile>/path/to/mylog.txt</logFile>
        <cacheDirectory>/path/to/mycache/</cacheDirectory>
        <skipEvents>1</skipEvents>
        <numberOfEvents>1000</numberOfEvents>
            <numberOfEvents>1000</numberOfEvents>
        <verbose>true</verbose>
        <printDriverStatistics>true</printDriverStatistics>
        <printSystemProperties>true</printSystemProperties>
        <printUserClassPath>true</printUserClassPath>
        <printDriversDetailed>true</printDriversDetailed>
    </control>
    <classpath>
        <jarUrl />
        <jar />
        <directory />
    </classpath>
    <define>
        <anExampleVariable>1234</anExampleVarible>
    </define>
    <execute>
        <driver name="ExampleDriver" />
    </execute>
    <drivers>
        <driver name="ExampleDriver" type="org.lcsim.example.ExampleDriver">
            <exampleParam>1234</exampleParam>
            <exampleArrayParam>1 2 3 4</exampleParam>
            <exampleArray2DParam>1 2 3 4; 5 6 7 8</exampleArray2DParam>
        </driver>
    </drivers>
</lcsim>

...

The <inputFiles> section contains a list of local or remote files to be processed. It may contain a mixture of any of the elements described below, but it may not be empty. And it must result in at least one input file being found or the job will fail.

file

These can be <file> elements which contain The <file> element is a relative or absolute path to a file on the local file system.

No Format

<inputFiles>
    <file>/path/to/local/datafile.slcio</file>
</inputFiles>

fileUrl

Remote files that accessible via a public URL can be accessed using a <fileUrl> elementOr it may be a publically accessible URL.

No Format

<inputFiles>
    <fileUrl>ftp<file>ftp://example.org/datafile.slcio</fileUrl>file>
</inputFiles>

Some batch systems may not support remote file access via a URL. Check with your administrator.

These remote files will be downloaded to the cache directory, which is ~/.cache, by default. A different local cache directory can be specified using the <cacheDirectory> tag in the <control> section.

fileSet

Sets of files on the local filesystem with the same base directory can be specified by using the <fileSet> element.

No Format

<fileSet baseDir="/my/data/dir">
    <file>events1.slcio</file>
    <file>events2.slcio</file>

When processing these files, the base direcotry "/my/data/dir" will be prepended to each file to make a complete file path.

fileList

The <fileList> element should point to a text file containing a list of files, one per line.

For instance, say that you had a local text file at /example/mylciofiles.txt containing paths to local LCIO files.

No Format

/my/data/dir/events1.slcio
/my/data/dir/events2.slcio

This can be fed into LCSim using this XML code.

No Format

<fileList>/example/mylciofiles.txt</fileList>

fileUrlList

using the <cacheDirectory> tag in the <control> section.

fileSet

Sets of files on the local filesystem with the same base directory can be specified by using the <fileSet> element.

No Format
<fileSet baseDir="/my/data/dir">
    <file>events1.slcio</file>
    <file>events2.slcio</file>
</fileSet>

When processing these files, the base direcotry "/my/data/dir" will be prepended to each file to make a complete file path.

fileList

The <fileList> element should point to a text file containing a list of files, one per line.

For instance, say that you had a local text file at /example/mylciofiles.txt containing paths to local LCIO files.

No Format
/my/data/dir/events1.slcio
/my/data/dir/events2.slcio

This can be fed into LCSim using this XML code.

No Format
<fileList>/example/mylciofiles.txt</fileList>

fileRegExp

The <fileRegExp> element will include files that match a regular expression.

Here is an example that would match files similar to input1.slcio, input2.slcio, etc. in the current directory.

No Format
<fileRegExp baseDir=".">input*[0-9].slcio</fileRegExp>

See http://docs.oracle.com/javase/tutorial/essential/regex/ for more information about regular expressions in JavaThe <fileUrlList> is similar to the <fileList> except it contains URL's to online data instead of paths on the local file system. For instance, the fileUrlList could point to files available via the http or ftp protocols.

Job Control

The <control> section contains parameters that control the batch job, including the number of events to run and whether various debugging output should be printed.

...

The following will turn on all verbose output but turn off the printing of the system properties.

No Format

<control>
    <verbose>true</verbose>
    <printSystemProperties>false</printSystemProperties>
<control>

...

The <verbose> tag should be set to true to enable verbose debugging output when the XML input file is processed. This turns on all of the "print" elements described above, which can still be turned off individually by setting them to false after verbose has been turned on.

Variable Definitions

The job manager has very limited support for "free" variable definitions, using the <define> block.

...

Here is an example of a simple double parameter.

No Format

<define>
    <aDoubleParam>1.1</aDoubleParam>
</define>

Variables defined here can be included in expressions by using their name.

No Format

<define>
    <aDoubleParam1>1.1</aDoubleParam1>
    <aDoubleParam2>2.2</aDoubleParam2>
    <aDoubleParam3>aDoubleParam1 + aDoubleParam2</aDoubleParam3>
</define>

...

Here is an example pointing to a (non-existant) jar at a URL.

No Format

<classpath>
    <jarUrl>http://www.example.org/something/myjar.jar</jarUrl>
</classpath>

The same thing can be done with local jar files and directories.

No Format

<classpath>
    <jar>/path/to/myjar.jar</jar>
    <directory>/path/to/myclassfiles</directory>
</classpath>

...

Here is an example Driver class with a number of setter methods.

No Format

package org.lcsim.example;

public class MyDriver
{
    public void setX(int x);
    public void setX1(int[] x1);
    public void setX2(int[][] x)2;
  
    public void setFile(File f);
    public void setUrl(URL url);
    public void setVector(Hep3Vector vec);
}

...

This is the corresponding XML code in <drivers> that would pass values to each of these methods.

No Format

<driver name="MyDriver" type="org.lcsim.example.MyDriver">
    <x>1</x>
    <x1>1 2 3</x1>
    <x2>1 2 3; 4 5 6</x2>
    <file>/path/to/a/file.txt</file>
    <url>http://example.org/file.txt</url>
    <vector>1.0 2.0 3.0</vector>
</driver>

...

  • The Driver class must be public.
  • The Driver class must have a public constructor that takes no arguments.
  • The Driver's constructor should not do any initialization. It should instead use the detectorChanged() or startOfData() methods, which are called after all input parameters are processed.
  • The set methods to be accessed in the XML should always be of the form

    No Format
    public void set[ParameterName]([type] [varName])

    Set methods not of this form will not be accessible as XML parameters.

  • The use of sub-drivers is discouraged due to these being inaccessible by the XML format, though it is still possible to use them. Any dependence of a child Driver on its parent's XML input parameters can be handled by using the startOfData() method to add a new child Driver instance.

How to Run a Specific Release

Running your job with a specific LCSim release is straightforward. Download the bin jar from the lcsim repository, and then use the java command to execute your steering file.

No Format

wget http://www.lcsim.org/maven2/org/lcsim/lcsim/1.14-SNAPSHOT/lcsim-1.14-SNAPSHOT-bin.jar
java -jar ./lcsim-1.14-SNAPSHOT-bin.jar mySteeringFile.xml
  • s XML input parameters can be handled by using the startOfData() method to add a new child Driver instance.

How to Run a Specific Release

You do not need to build lcsim yourself in order to run a specific release.  The SLAC Nexus Repository can be searched for all lcsim-distribution releases which will display a table including downloadable links.  The bin.jar links are the runnable jars which can be downloaded to your machine and run as per the above instructions. This way of running LCSim has the potential to cause errors, e.g. if you run a steering file written for a different version where method signatures have changed or been removed or renamed.