You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

One of the main software issues we have observed in LCLS is the fragmentation of tools adopted by LCLS users for their data analysis. These tools range from Matlab and IDL, to our own C++/python psana, to frameworks like cass and cheeta.

We believe that the advantage of having high fragmentation, namely the freedom of the users to adopt whatever they want, is outweighed by its disadvantages: this fragmentation ends up spreading our effort too thin just to provide the basic data analysis services and, consequently, it affects our ability to provide support for more advanced services like peak finding algorithms, autocorrelation calculations, etc.

The main reason why LCLS cannot focus its efforts on one framework is that we haven't found one tool with all the required features; for example, Matlab is very powerful, but it's too slow and it's proprietary, psana is fast, open-source, but it doesn't have all the display abilities and functions that Matlab has.

We have considered using ROOT to power psana, but there are a few issues with ROOT; for example, it tends to take over your application (eg by handling signals), any one library tends to bring in many other libraries non strictly required, it has a few global names which can collide with users symbols, its development is heavily LHC oriented and it uses C++ as scripting language which may not be the best solution for many FEL users.

We've been considering adopting something other than ROOT for a while, but we haven't found any serious alternative. We claim that the key services our users would like are:

  • Ability to use both C++ and python (when performance allows).
  • Ability to plot histograms, xy plots, images, functions, etc.
  • Ability to fit complex data.
  • Ability to persist basic objects (histograms, arrays, ntuple), ie the ability to store and retrieve intermediate analysis results.
  • Ability to invoke high performance algorithms; these can range from basic algorithms, like FFTs, to more specialized operations, like peak finding.

There are many tools for plotting (see, for example, Mantid, or matplotlib), for saving and retrieving data (HDF5), for fitting, for GUI development (QT), for basic algorithms (GSL) but no integrated solution.

We'd like to start the effort of creating an integrated ecosystem for data analysis for FEL facilities. Note that this effort does not want to replace whatever framework the facilities use to do analysis, eg psana. The final result would be a set of libraries that psana can invoke to implement the key operations indicated above.

This ecosystem would be based on existing libraries as much as possible and we would write new code only when needed. We could, for example, write a thin layer which allows to save histograms or other objects to HDF5.

Core subset of ROOT Features

Feature

Comment

Interpreter

CINT, cling, PyRoot

Input/Output

Easy to save/restore histograms, trees; more complex to save arbitrary structures

Math libraries

Numerical algorithms, linear algebra, vectors, statistics, fitting

GUI

Widgets, signals/slots mechanism

Histograms

1D, 2D, Profile

Trees

To store large quantities of same-class objects

2D graphics

Display 1D and 2D histograms, functions, graphs

3D graphics

Event display

Possible replacements

Feature

Comment

Interpreter

Python

Input/Output

Could need dedicated development effort, could be built on top of HDF5

Math libraries

NumPy, SciPy, GSL. Issue: different environments between python and C/C++?

GUI

QT, PyQt

Histograms

Could need dedicated development effort

Trees

Could need dedicated development effort

2D graphics

Matplotlib? How to display histograms and trees?

3D graphics

Possibly not needed for now

  • No labels