Introduction
One of the main software issues we have observed in LCLS is the fragmentation of tools adopted by LCLS users for their data analysis. These tools range from Matlab and IDL, to our own C++/python psana, to frameworks like cass and cheeta.
We believe that the advantage of having high fragmentation, namely the freedom of the users to adopt whatever they want, is outweighed by its disadvantages: this fragmentation ends up spreading our effort too thin just to provide the basic data analysis services and, consequently, it affects our ability to provide support for more advanced services like peak finding algorithms, autocorrelation calculations, etc.
The main reason why LCLS cannot focus its efforts on one framework is that we haven't found one tool with all the required features; for example, Matlab is very powerful, but it's too slow and it's proprietary, psana is fast, open-source, but it doesn't have all the display abilities and functions that Matlab has.
We have considered using ROOT to power psana, but there are a few issues with ROOT; for example, it tends to take over your application (eg by handling signals), any one library tends to bring in many other libraries non strictly required, it has a few global names which can collide with users symbols, its development is heavily LHC oriented and it uses C++ as scripting language which may not be the best solution for many FEL users.
We've been considering adopting something other than ROOT for a while, but we haven't found any serious alternative. We claim that the key services our users would like are:
- Ability to use both C++ and python (when performance allows).
- Ability to plot histograms, xy plots, images, functions, etc.
- Ability to fit complex data.
- Ability to persist basic objects (histograms, arrays, ntuple), ie the ability to store and retrieve intermediate analysis results.
- Ability to invoke high performance algorithms; these can range from basic algorithms, like FFTs, to more specialized operations, like peak finding.
There are many tools for plotting (see, for example, Mantid, or matplotlib), for saving and retrieving data (HDF5), for fitting, for GUI development (QT), for basic algorithms (GSL) but no integrated solution.
We'd like to start the effort of creating an integrated ecosystem for data analysis for FEL facilities. Note that this effort does not want to replace whatever framework the facilities use to do analysis, eg psana. The final result would be a set of libraries that psana can invoke to implement the key operations indicated above.
This ecosystem would be based on existing libraries as much as possible and we would write new code only when needed. We could, for example, write a thin layer which allows to save histograms or other objects to HDF5.
Core subset of ROOT Features
Feature |
Comment |
---|---|
Interpreter |
CINT, cling, PyRoot |
Input/Output |
Easy to save/restore histograms, trees; more complex to save arbitrary structures |
Math libraries |
Numerical algorithms, linear algebra, vectors, statistics, fitting |
GUI |
Widgets, signals/slots mechanism |
Histograms |
1D, 2D, Profile |
Trees |
To store large quantities of same-class objects |
2D graphics |
Display 1D and 2D histograms, functions, graphs |
3D graphics |
Event display |
Possible replacements
Feature |
Comment |
---|---|
Interpreter |
Python |
Input/Output |
Could need dedicated development effort, could be built on top of HDF5 |
Math libraries |
NumPy, SciPy, GSL. Issue: different environments between python and C/C++? |
GUI |
QT, PyQt |
Histograms |
Could need dedicated development effort |
Trees |
Could need dedicated development effort |
2D graphics |
Matplotlib? How to display histograms and trees? |
3D graphics |
Possibly not needed for now |