Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Translation from the LCLS data format of XTC to the more general scientific format  HDF5 is carried out by the XTC to HDF5 Translator. We refer to the software that carries out the translation as psana-translate. This distinguishes it from the earlier version of the software called o2o-translate (which has been deprecated). Translation for experiments should generally be carried out by using the automatic hdf5 translation feature of the Web Portal of Experiments. Each experiment has a "HDF5 Translation" tab that provides this interface. Below features are discussed that allow for customization of the translation - filtering unwanted events or detectors as well as adding processed data. The web portal provides a mechanism to customize the translation using these features. Experiment POC's should be able to help with using these features. Documentation on o2o-translate, which discusses some history with regards to selecting hdf5 for a scientific data format for general use can be found

...

The rest of this document covers psana-translate. psana-translate runs as a psana module. As such, we have been able to develop several new features that will be discussed below. However the main technical reason for phasing out o2o-translate is to use a Data Description Language (DDL) to generate code that handles the many data types that different detectors produce. This use of DDL is part of psana-translate.

Running psana-translate

When translating entire runs of an experimentGenerally, users should use the HDF5 Translation tab for their experiment on the web portal of experiments. This allows users to automatically translate runs as they become available and are in progress. In addition, Translation can produce large amounts of data that needs to be managed by LCLS. The HDF5 Translation page includes two options for translation - standard and monitoring. Most users will use standard. Multi-calib cycle experiments can use the monitoring translation tab discussed in a section below. In this section we discuss the standard translation. The automatic system launches jobs through the batch system using the appropriate queues. The queue used can affect I/O performance. In addition, a faster parallel compression library is used.

...

would invoke the translator.  It will translate all the xtc files in run 71 of the xpptut13 dataset. This runs with default values for all the translator options. These are the recommended option values to use for translation.  The options include gzip compression at level 1 and no filtering on events or data. The Translator does not overwrite an existing h5 output file by default (set the option overwrite=true to overwrite the output). There are many different options you can set for translation that are discussed below. The easiest way to try different translator options is to write a psana.cfg file.  Copy the file default_psana.cfg that is included below or from the Translator package directory and modify option values that you wish.  The file default_psana.cfg Later in this document, we include a long psana config file that includes extensive documentation on all the translator options. It is recommended that users copy and modify this document to try different options.

When running the Translator directly, you can make use of the same parallel compression library that the automatic translation system uses by setting the following environment variables (below is for bash):

...

to translate in the same manner as the automatic translation system. Better I/O is achieved by writing output to the scratch folder of the experiment rather than a users home directory (on the LCSL system) and running translation through the batch system using the appropriate queue.

Split Scan Mode / Monitoring Translator

The Translator supports split scan mode. In this mode, calib cycles are written to separate hdf5 files. A master file will have external links to these separate hdf5 files. Users need only work with the master file. The master file uses the same schema as one finds without split scan mode. Little modification to user code is required when working with the master file. What is required, is following external links (see below for tips on this). Not all experiments use more than one calib cycle. For experiments that use one calib cycle per run, split scan mode provides no benefit. Two reasons to use split scan mode is first, that the resulting hdf5 file from normal translation is too large, and second, to make translation faster by translating the data in parallelThe main reason users will opt for the split scan translator it to achieve the fastest possible translation for the purposes of online monitoring. Often, this involves custom configuration to reduce the DAQ data translated, as well as adding results from users own code to the translation. The split scan translator is implemented using MPI and translates different calib cycles in parallel. It has its own driver program to be launched using MPI called h5-mpi-translate.

Launching Split Scan

Here is an example command for launching the mpi based splitscan Translator:

For this reason, the split scan translator is also referred to as the monitoring translator.

The split scan translator has a master/worker architecture. The single master process reads through the data and finds where the calib cycles start. It then assigns calib cycles to the worker pool.

Running Split Scan

The simplest way to launch the split scan translator is through the web portal of experiments - the monitoring choice on the HDF5 Translation tab. Before covering the options available through the web page, we'll look at launching jobs by hand. Here is an example command for launching the mpi based splitscan Translator for data that has already finished being written to the offline file system:

bsub -q psanaq -a bsub -q psanaq -a mympi -n 9 -R "span[ptile=1]" -o translate_%J.out h5-mpi-translate -m cspad_mod.CsPadCalib,Translator.H5Output -o Translator.H5Output.printenv=True -o Translator.H5Output.output_file=mydir/split.h5 exp=xpptut13:run=71

The first few arguments for bsub set up the batch job, everything after h5-mpi-translate are arguments for the mpi split scan translator.

batch arguemnts

...

For the purposes of online monitoring, we often want to translate while taking data, and we will want to use the appropriate high priority queue for the instrument our experiment uses. These queues are psnehprior and psfehprior - for instruments in the near hall and far hall respectively. In addition, there are certain arguments for launching the batch job that seem to increase translation speed - at the expense of using a large number of resources in the queue. An example launch command for an XPP experiment in the near hall might look like

bsub -q psnehprior -a mympi -n 9 -x -R "span[ptile=2]" -o translate_%J.out h5-mpi-translate -m cspad_mod.CsPadCalib,Translator.H5Output -o Translator.H5Output.output_file=mydir/split.h5 exp=xpp7815:run=189:live:dir=/reg/d/ffb/xpp/xppf7815/xtc

Note - it is important to make sure that the priority queues are available for experiments during their scheduled time. In particular, if one is doing online monitoring for an experiment running during the day shift using a high priority queue, one must stop using the queue if it will be used by an experiment running during the night shift (use the "stop all" button if translating through the web portal of experiments). At that point one would switch to doing translation using the offline psanaq, and the offline file system rather than the ffb. Optimal I/O performance requires coordinating which queue you use with where the data is read from, and written to. In particular, the ffb data location should only be used for the high priority queues. See Submitting Batch Jobs for more details on the different queues, In terms of where to write data, write to an experiment folder like ftc or scratch - don't write to your home directly.

batch arguments

The bsub batch arguments used on the priority queue are

  • -x no other batch jobs run on the nodes used for this job
  • -R "span[ptile=2]" only run two processes one each node.
  • -n 9 use 9 processes for the job (1 will be the master, and 8 for calib cycles, up to 8 calib cycles will be translated simultaneously). The master process is always the last process. By using 9 processes and ptile=2, and -x, the master process runs by itself on one node. No workers will be doing I/O on the node. This gives the master the most possible resources for its job of finding calib cycles. In general, for best performance, if n is the number of processes and k the processes per node, (the ptile value) choose them so that n mod k == 1
  • The job output file (translate_39283.out, where 39283 will be whatever job number the batch system assigns) will record timing information for the master and workers.

These arguments dedicate significant resources of the priority queue to the translation of each run to achieve the fastest possible translation. There are two costs to this approach. The first is less resources for other jobs in the queue, like translating other runs. The second is increasing the likelihood of failure by using a problematic node. If one of the nodes in the queue is having trouble, these options increase our odds of running a job on it. In general we do not recommend these arguments for the offline queue: psanaq.

The translator, when compressing, spends about 70% of its time as a multi-threaded application (using 9 of the 12 cores on the nodes) - thus the ptile=2. Even when not compression, using lower ptile values (1,2,3) seems to increase overall performance. I believe this is due to the I/O intensive nature of the Translator. On paper, we expect our filesystem and network links to perform just as well with ptile=12 (the default) as opposed to ptile = 1,2,3, however in practice this is not what I've seen.

How many processes to use?

Presently, the split scan translator does not dynamically request new processes as it needs them, it uses whatever pool it has been launched with (the -n batch argument above). Consequently the user needs to pick the optimal number of processes to use. Ideally, we want a worker to be free when each new calib cycle is discovered. Assuming the fast_index option is used (see below), and optimal batch arguments like -x, ptile=2 with an odd number of processes are used, the master should keep up with live data at 120hz. Then it is a question of how slow the calib cycle translation is. For 1 full cspad with compression, I would allow for 11hz to translate calib cycles. 120/11 means I want 11 workers, so n=12, and I'll make it n=13 to get the master running by itself.

The Translator output, captured in log files by the bsub -o option, provide useful information on the translation rate for the workers, the rate the master gets through the data, and how many calib cycles each worker translated (if you see workers that didn't translate any calib cycles, you have assigned more processes to the translation job then the system can use). 

Translator arguments

Looking again at the bsub command line:

bsub -q psanaq -a mympi -n 9 -o translate_%J.out h5-mpi-translate -m cspad_mod.CsPadCalib,Translator.H5Output -o Translator.H5Output.output_file=mydir/split.h5 exp=xpptut13:run=71

You

...

Translator arguments

As you read through the above command line, you will see that h5-mpi-translate, rather than psana, is the driver program for mpi based split scan translation. h5-mpi-translate takes the same arguments as psana does. It understands one additional option, printenv=True, which has been added. This option is only picked up by h5-mpi-translate and can be useful for debugging, but is not necessary (it prints all the environment variables the Translator sees). In particular, h5-mpi-translate understands the -c file.cfg option so that options may be specified in a configuration file.

...

For online analysis with live data, one of the impediments to keeping up with the data is the time it takes h5-mpi-translate's to read through the data to find the calib cycles. Typically the data is recorded in 6 or more separate files and each must be read through to identify the start of calib cycles. Unfortunately this read speed can be 30-40hz for a typical experiment - far short of the 120hz we'd like to obtain. A recent feature added to h5-mpi-translate takes advantage of the unique signature of each new calib cycle, combined with the regular structure of the separate data files in order to limit the reading to just one of the files. In this way, the h5-mpi-translate master rank need only get through the data it reads/searches at 20hz to keep up with the data. We have had good success with this feature recently, but it is not guaranteed to work - in particular high level of damaged data degrades the regular structure of the DAQ files. This in turn will increase the fastindex search time to the point where it is no longer useful, or fails. fastindex is a temporary solution until a more robust way to do fast/live indexing is put in place. However in the meantime, starting with analysis release ana-0.13.17, the translator supports the following options to turn on fast indexing and controlling how much time is spent searching the other files:

Code Block
fast_index=1                 # do the fast indexing, be default it is off
fi_mb_half_block=12          # when fast indexing is on, use 12MB on each side, or 24MB for each block that is searched
fi_num_blocks=40             # this it half the number of 'other' blocks to try. The translator will try 1 + 2*40 = 81 blocks if this is 40 (about 1GB total search)

To obtain More information can be found in the Psana Configuration File and All the Options section below.

...