Data Summary Tool

A lightweight, flexible analysis micro-framework suitable for both production "standardized" data summary analysis and rapidly prototyping specialized research analyses.  In the data summary analysis, the program will perform a "standard" analysis on every piece of data encountered in the data files and return a summarizing result.  For the specialized research, analysis and results can be more arbitrary.

The dream is also to run online with the shared memory, but that mode is not supported yet.

Usage

The data summary tool is in SVN, and there is a version checked out in the directory below, which can be used by executing the following commands.

% cd /reg/neh/home/justing/my_ana_rel/DataSummary
% sit_setup

The output is placed in the users home directory at ~/data-summary/

The code can be checked out and modified with the commands at the end of this page.

The data summary tool can be used in the following three ways:

  1. locally on a psana node in a single core mode,
  2. locally on a psana node in a multi-core mode with mpirun,
  3. or in a batch multi-core mode using the bsub command.
1] % python data-summary-tool.py CXI/cxic0114 111
2] % mpirun -n 6 python data-summary-tool.py CXI/cxic0114 111
3] % bsub -a mympi -n 24 -o mpi.log -q psanaq python data-summary-tool.py CXI/cxic0114 111

There are also other options that can be passed to the launcher.py script:

usage: data-summary-tool.py [-h] [--max-events-per-node MAX_EVENTS] [--plot-vs X_AXES]
                   [--verbose] [--xkcd] [--base-output-dir BASEOUTPUTDIR]
                   exp run

positional arguments:
  exp                   the experiment, e.g. CXI/cxic0114
  run                   run to process, e.g. 111

optional arguments:
  -h, --help            show this help message and exit
  --max-events-per-node MAX_EVENTS, -M MAX_EVENTS
                        maximum events to process per node
  --plot-vs X_AXES, -X X_AXES
                        pass in channels to plot against, can be passed
                        multiple times
  --verbose, -v         verbosity level of logging, default is 4 (INFO),
                        choices are 1-5 (CRITICAL, ERROR, WARNING, INFO,
                        DEBUG), can pass -v multiple times
  --xkcd, -x            use XKCD plot sytle
  --base-output-dir BASEOUTPUTDIR, -O BASEOUTPUTDIR
                        set output folder for reports

 By default the output is placed in the running user's $HOME/data-summary/ directory.  If that directory doesn't exist, it is created.  This is configurable with the '-O' optional argument.

Get the code

The code has been developed using github.com and the repository can be viewed, forked and commented at from this url: https://github.com/jgarofoli/LCLS-data-summary.

The code is also checked in to SVN.  It can be retrieved by following these instructions:

$> ssh psdev
$> newrel ana-currrent datasummarytest
$> cd datasummarytest
$> addpkg DataSummary
$> scons
$> ssh psana
$> cd datasummarytest/arch/x86_64-rhel5-gcc41-opt/python/DataSummary
$> python data-summary-tool.py CXI/cxic0114 111 -M 400
$> gnome-open ~/data-summary/cxic0114_run111.latest/report.html

Output

Output is rendered as an html file containing images and text, and a python dictionary containing data and image locations suitable for programmatic consumption.

  • No labels