Introduction

One of the main software issues we have observed in LCLS is the fragmentation of tools adopted by LCLS users for their data analysis. These tools range from Matlab and IDL, to our own C++/python psana, to frameworks like cass and cheeta.

We believe that the advantage of having high fragmentation, namely the freedom of the users to adopt whatever they want, is outweighed by its disadvantages: this fragmentation ends up spreading our effort too thin just to provide the basic data analysis services and, consequently, it affects our ability to provide support for more advanced services like peak finding algorithms, autocorrelation calculations, etc.

The main reason why LCLS cannot focus its efforts on one framework is that we haven't found one tool with all the required features; for example, Matlab is very powerful, but it's too slow and it's proprietary, psana is fast, open-source, but it doesn't have all the display abilities and functions that Matlab has.

We have considered using ROOT to power psana, but there are a few issues with ROOT; for example, it tends to take over your application (eg by handling signals), any one library tends to bring in many other libraries non strictly required, it has a few global names which can collide with users symbols, its development is heavily driven by LHC needs and it uses C++ as scripting language which may not be the best solution for many FEL users.

We've been considering adopting something other than ROOT for a while, but we haven't found any serious alternative. There are many tools for plotting (see, for example, matplotlib), for saving and retrieving data (HDF5), for fitting (minuit2), for GUI development (QT), for basic algorithms (GSL) but no integrated solution.

We'd like to start the effort of creating an integrated ecosystem for data analysis for FEL facilities. Note that this effort does not want to replace whatever framework the facilities use to do analysis (in our case psana). The final result would be a set of libraries that psana can invoke to implement the key operations indicated above.

This ecosystem would be based on existing libraries as much as possible and we would write new code only when needed. We could, for example, write a thin layer which allows to save histograms or other objects to HDF5.

Requirements

We claim that the key services our users would like are:

  • Ability to write both C++ and python modules.
  • Ability to persist basic objects (histograms, arrays, ntuple), ability to store and retrieve intermediate analysis results.
  • Ability to fit complex data.
  • Ability to invoke high performance algorithms; these can range from basic algorithms, like FFTs, to more specialized operations, like peak finding.
  • Ability to plot histograms (1D, 2D, profiles), xy plots (graphs), images, functions
  • Ability to interact with the plots (resize, region of interest, etc)

In addition:

  • Any function or class which can be used on the C++ side, can also be used on the python side and viceversa (similarly to Qt/PyQt)

Core Subset of ROOT Features and Possible Replacements

This table tries to identify possible replacements for the key features of ROOT that we'd like to provide.

Feature

ROOT Approach

Replacement

Interpreter

CINT, cling, PyRoot

Python

Input/Output

Easy to save/restore histograms, trees

Dedicated development effort? Based on HDF5?

Math libraries

Numerical algorithms, linear algebra, statistics, fitting

NumPy, SciPy, GSL, minuit. Different classes and functions for python and C/C++?

GUI

Widgets, signals/slots mechanism

Qt, PyQt

Histograms

1D, 2D, Profile

Dedicated development effort?

Trees

To store large quantities of same-class objects

Dedicated development effort?

2D graphics

Display 1D and 2D histograms, functions, graphs

Matplotlib? How to display histograms and trees?

3D graphics

Event display

Possibly not needed for now

  • No labels