This page tries to summarize how to read XTC files interactively
- How to read XTC files (LCLS's primary data format)?
The short answer
Not yet possible!
The long answer
Several tools exist to read XTC files sequencially. Currently, to work interactively with data from XTC files you should read it with one of the tools, store the data in memory or file, and work with it interactively with your tool of choice (e.g. IPython).
Existing tools:- psana framework (C++)
- pyana framework (python)
- xtcreader (C++) / pyxtcreader (python)
- xtcscanner (python)
- xtcexplorer (python)
The rest of this page currently elaborates on The long answer, with a bias towards using python.Coming soon!
We are currently working on better infrastructure for interactive analysis of XTC files. We welcome input from you if you think you may be one of the users of this .
- XTC files can be tranlated to HDF5 format on requst.
These may allow interactive analysis by outside tools (e.g. Matlab) or python (see How to access HDF5 data from Python). Be aware, though, that the framework (psana or pyana) does the job of syncronizing event data for you, and the lack of syncronization of arrays in the HDF5 files is the biggest drawback of working on datafiles outside of our framework.
The existing tools:
Python framework: pyana
The most pain-free way to access LCLS XTC data files from python is through LCLS's python framework, pyana. It is a non-interactive framework, but to some extent you can work interactively with the data it produces
All about pyana.
C++ framework: psana
The idea is the same as for pyana. Non-interactive. No interactive support as of yet.
All about psana - Original Documentation.
If you like GUIs:
The XTC Explorer - Old gives you an "interactive" way to configure your analysis.
If you like python (or IPython):
Python/IPython can be used to analyze data after you've saved them, or they can be embedded into a pyana module to give you interactive access to the data at regular intervals throughout your analysis.
'IPython' (http://ipython.org/) is an enhanced python shell for interactive use. Many of the examples here would work equally well with a 'regular' python shell.
Plotting is done with 'matplotlib' (http://matplotlib.sourceforge.net/)
If you're looking for an IDE to work with, consider 'Spyder' (http://code.google.com/p/spyderlib/).
Interactively exploring the XTC file.
Quick-start way to figure out what's in your xtc file is to run 'xtcscanner' or 'xtcexplorer'. The output can help you write a pyana module for further analysis. The explorer allows you to make some quick plots too.
xtcscanner
This tool also belongs to the XtcExplorer package, and is used by the GUI. But the tool can also be run directly from the command line:
usage: xtcscanner [options] xtc-files ... options: -h, --help show this help message and exit -n NDATAGRAMS, --ndatagrams=NDATAGRAMS -v, --verbose -l L1_OFFSET, --l1-offset=L1_OFFSET
Example:
Scanning.... Start parsing files: ['/reg/d/psdm/AMO/amo01509/xtc/e8-r0094-s00-c00.xtc', '/reg/d/psdm/AMO/amo01509/xtc/e8-r0094-s01-c00.xtc'] 201 datagrams read in 0.070000 s . . . . . . . ------------------------------------------------------------- XtcScanner information: - 1 calibration cycles. - Events per calib cycle: [197] Information from 0 control channels found: Information from 9 devices found BldInfo:EBeam: EBeamBld (197) BldInfo:FEEGasDetEnergy: FEEGasDetEnergy (197) DetInfo:AmoETof-0|Acqiris-0: (5 ch) AcqConfig_V1 (1) AcqWaveform_V1 (197) DetInfo:AmoGasdet-0|Acqiris-0: (2 ch) AcqConfig_V1 (1) AcqWaveform_V1 (197) DetInfo:AmoITof-0|Acqiris-0: (1 ch) AcqConfig_V1 (1) AcqWaveform_V1 (197) DetInfo:AmoMbes-0|Acqiris-0: (1 ch) AcqConfig_V1 (1) AcqWaveform_V1 (197) DetInfo:EpicsArch-0|NoDevice-0: Epics_V1 (688) DetInfo:NoDetector-0|Evr-0: EvrConfig_V2 (1) ProcInfo:: RunControlConfig_V1 (11) XtcScanner is done! -------------------------------------------------------------
The XtcExplorer GUI.
With interactive python embedded, see: https://confluence.slac.stanford.edu/display/PCDS/XTC+Explorer#XTCExplorer-InteractiveplottingwithIPython
IPython used "like" MATLAB
Of course MATLAB is much more than this, but here's what we've started with. Here are some examples with IPython based on matlab functions provided by XPP. Thanks to H. Lemke for matlab examples and advice. A python module pymatlab.py defines a number of functions to use in this analysis example.
Starting an interactive session
[ofte@psana0XXX myrelease]$ ipython -pylab Python 2.4.3 (#1, Nov 3 2010, 12:52:40) Type "copyright", "credits" or "license" for more information. IPython 0.9.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more.
In [1]: from pymatlab import *
Generally, it is recommended to load library modules with 'import pymatlab' and access all its methods and classes with pyamatlab.function. In an interactive session it may be easier to have access to the contents of pymatlab in your immediate workspace by doing 'from pymatlab import *'.
In [2]: who H5getobjnames ScanInput ScanOutput filtvec findmovingmotor getSTDMEANfrac_from_startpoint get_filter get_limits get_limits_automatic get_limits_channelhist get_limits_correlation get_limits_corrfrac h5py np plt rdXPPdata runexpNO2fina scan scaninput In [3]: whos Variable Type Data/Info -------------------------------------------------------- H5getobjnames function <function H5getobjnames at 0x2b57de8> ScanInput type <class 'pymatlab.ScanInput'> ScanOutput type <class 'pymatlab.ScanOutput'> filtvec function <function filtvec at 0x2b57f50> findmovingmotor function <function findmovingmotor at 0x2b57d70> getSTDMEANfrac_from_startpoint function <function getSTDMEANfrac_<...>_startpoint at 0x2b581b8> get_filter function <function get_filter at 0x2b57ed8> get_limits function <function get_limits at 0x2b58050> get_limits_automatic function <function get_limits_automatic at 0x2b58230> get_limits_channelhist function <function get_limits_channelhist at 0x2b582a8> get_limits_correlation function <function get_limits_correlation at 0x2b580c8> get_limits_corrfrac function <function get_limits_corrfrac at 0x2b58140> h5py module <module 'h5py' from '/reg<...>ython/h5py/__init__.pyc'> np module <module 'numpy' from '/re<...>thon/numpy/__init__.pyc'> plt module <module 'matplotlib.pyplo<...>n/matplotlib/pyplot.pyc'> rdXPPdata function <function rdXPPdata at 0x2b57c80> runexpNO2fina function <function runexpNO2fina at 0x2b57e60> scan ScanOutput <pymatlab.ScanOutput object at 0x2b60536bee90> scaninput ScanInput <pymatlab.ScanInput object at 0x2b60536b4e90>
Like in MATLAB, who
gives you a short list of workspace contents, whos
gives you a longer list of workspace contents.
Plot filtered IPIMB data with limits from graphical input:
Here's a log from a session that produces a loglog plot (blue dots) of two IPIMB channels, selects limits from graphical inpu (mouse click),
draws the selected events with red dots.
In [3]: scaninput = ScanInput() In [4]: scaninput.fina = "/reg/d/psdm/XPP/xpp23410/hdf5/xpp23410-r0107.h5" In [5]: scan = rdXPPdata(scaninput) Reading XPP data from /reg/d/psdm/XPP/xpp23410/hdf5/xpp23410-r0107.h5 Found pv control object fs2:ramp_angsft_target Found scan vector [ 2800120. 2800240. 2800360. 2800480. 2800600. 2800720. 2800840. 2800960. 2801080. 2801200. 2801320. 2801440. 2801560. 2801680. 2801800. 2801920. 2802040. 2802160. 2802280. 2802400. 2802520. 2802640. 2802760. 2802880. 2803000. 2803120. 2803240. 2803360. 2803480. 2803600. 2803720. 2803840. 2803960. 2804080. 2804200. 2804320. 2804440. 2804560. 2804680. 2804800. 2804920. 2805040. 2805160. 2805280.] Fetching data to correlate with motor ['IPM1', 'IPM2'] (44, 120, 4) In [6]: channels = np.concatenate(scan.scandata,axis=0) In [7]: channels.shape Out[7]: (5280, 4) In [8]: get_limits(channels,1,"correlation") 4 channels a 5280 events indexes that pass filter: (array([ 1, 5, 8, ..., 5266, 5272, 5273]),) Out[8]: array([[ 0.00086654, 0.01604564], [ 0.67172102, 0.71968567], [ 0.00194716, 0.01447819], [ 0.80365403, 0.73463468]]) In [9]: plt.draw()
Table of comparison (MATLAB vs MatPlotLib)
See also http://www.scipy.org/NumPy_for_Matlab_Users
MatLab |
MatPlotLib |
Comments |
---|---|---|
Loglog plot of one array vs. another % % % a1 = subplot(121); loglog(channels(:,1),channels(:,2),'o') xlabel('CH0') ylabel('CH1') a2 = subplot(122); loglog(channels(:,3),channels(:,4),'o') xlabel('CH2') ylabel('CH3') |
Loglog plot of one array vs. another import matplotlib.pyplot as plt import numpy as np a1 = plt.subplot(221) plt.loglog(channels[:,0],channels[:,1], 'o' ) plt.xlabel('CH0') plt.ylabel('CH1') a2 = plt.subplot(222) plt.loglog(channels[:,2],channels[:,3], 'o' ) plt.xlabel('CH2') plt.ylabel('CH3') |
channels is a 4xN array of floats, where N is the number of events. Each column corresponds to one out of four Ipimb channels. |
test |
test |
Test |
array of limits from graphical input |
array of limits from graphical input |
|
axes(a1) hold on lims(1:2,:) = ginput(2); axes(a2) hold on lims(3:4,:) = ginput(2); |
lims = np.zeros((4,2),dtype="float") plt.axes(a1) plt.hold(True) lims[0:2,:] = plt.ginput(2) plt.axes(a2) plt.hold(True) lims[2:4,:] = plt.ginput(2) |
In MatLab, |
|
|
|
filter |
filter |
|
fbool1 = (channels(:,1)>min(lims(1:2,1)))&(channels(:,1)<max(lims(1:2,1))) fbool2 = (channels(:,2)>min(lims(1:2,2)))&(channels(:,2)<max(lims(1:2,2))); fbool = fbool1&fbool2 loglog(channels(fbool,1),channels(fbool,2),'or') fbool3 = (channels(:,3)>min(lims(3:4,3)))&(channels(:,3)<max(lims(3:4,3))) fbool4 = (channels(:,4)>min(lims(3:4,4)))&(channels(:,4)<max(lims(3:4,4))); fbool = fbool3&fbool4 loglog(channels(fbool,3),channels(fbool,4),'or') |
fbools0 = (channels[:,0]>lims[:,0].min())&(channels[:,0]<lims[:,0].max()) fbools1 = (channels[:,1]>lims[:,1].min())&(channels[:,1]<lims[:,1].max()) fbools = fbools0 & fbools1 fbools2 = (channels[:,2]>lims[:,2].min())&(channels[:,2]<lims[:,2].max()) fbools3 = (channels[:,3]>lims[:,3].min())&(channels[:,3]<lims[:,3].max()) fbools = fbools2&fbools3 |
Comment |
|
|
|
Writing Numpy and HDF5 files from python
You can store numpy arrays from a pyana job (reads XTC) and store them in simple numpy files or HDF5 files. Here are some examples:
Simple array to a NumPy file:
import numpy as np np.save("filename.npy", array) array = np.load("filename.npy") np.savetxt("filename.dat", array) array = loadtxt("filename.dat")
This example shows saving and loading of a binary numpy file (.npy) and an ascii file (.dat).
This only works with single arrays (max 2 dimensions).
If you need to save multiple events/shots in the same file you will need to do some tricks (e.g. flatten the array and stack 1d arrays into 2d arrays where axis2 represent event number). Or you could save as an HDF5 file.
Simple array to an HDF5 file
import h5py def beginjob(self,evt,env): self.ofile = h5py.File("outputfile.hdf5", 'w') # open for writing (overwrites existing file) self.shot_counter = 0 def event(self,evt,env) # example: store several arrays from one shot in a group labeled with shot (event) number self.shot_counter += 1 group = self.ofile.create_group("Shot%d" % self.shot_counter) image1_source = "CxiSc1-0|TM6740-1" image2_source = "CxiSc1-0|TM6740-2" frame = evt.getFrameValue(image1_source) image1 = frame.data() frame = evt.getFrameValue(image2_source) image2 = frame.data() dataset1 = group.create_dataset("%s"%image1_source,data=image1) dataset2 = group.create_dataset("%s"%image2_source,data=image2) def endjob(self,env) self.ofile.close()
This example is shown in a pyana setting. The HDF5 file is declared and opened in beginjob, datasets created for each event, and the file is closed in the endjob method.
Or you can group your datasets any other way you find useful, of course.
Saving complex datasets to HDF5 file
Some more advanced examples (courtesy of Hubertus Bromberger):
############## # Create data set ############## f = h5py.File('test.hdf5', 'w') f.create_dataset('t-nonames', data = rand(30000), dtype='<f4') f.create_dataset('t-names', data = np.array(rand(30000), dtype=[('ps', '<f4')])) dt = np.dtype([ ('Charge', '<f4'), ('Energy', '<f4'), ('PosX', '<f4'), ('PosY', '<f4'), ('AngX', '<f4'), ('AngY', '<f4'), ('PkCurrBC2', '<f4')]) f.create_dataset('eBeam-names', data = np.array([tuple(i.tolist()) for i in rand(30000, 7)], dtype=dt)) f.create_dataset('eBeam-nonames', data = rand(30000, 7), dtype='<f4') dt = np.dtype([('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4', (100,))]) f.create_dataset('dsSubset-names', data = np.array([tuple((i[0], i[1], i[2], i[3:].tolist())) for i in rand(30000,103)], dtype=dt)) f.create_dataset('dsSubset-nonames', data = rand(30000,13)) f.close()
############## # Load data and benchmark data access ############## f = h5py.File('test.hdf5', 'r') iterations = int(1e4) ####### # Single col ####### start = time.time() for i in xrange(iterations): a = f['t-names']['ps']/f['t-names']['ps'].max() print "Single column as compound dataset: %.2fs" % (time.time() - start) start = time.time() for i in xrange(iterations): a = f['t-nonames'][:]/f['t-nonames'][:].max() print "Single column as dataset: %.2fs" % (time.time() - start) start = time.time() a = f['t-names']['ps'] for i in xrange(iterations): b = a/a.max() print "Single column from compound dataset prior assignment: %.2fs" % (time.time() - start) start = time.time() a = f['t-nonames'][:] for i in xrange(iterations): b = a/a.max() print "Single column dataset and prior assignment: %.2fs\n" % (time.time() - start)
####### # Select single col from 2x2 ####### start = time.time() for i in xrange(iterations): a = f['eBeam-names']['Energy']/f['eBeam-names']['Energy'].max() print "Single column as compound dataset: %.2fs" % (time.time() - start) start = time.time() for i in xrange(iterations): a = f['eBeam-nonames'][:,1]/f['eBeam-nonames'][:,1].max() print "Single column as dataset: %.2fs" % (time.time() - start) start = time.time() a = f['eBeam-names']['Energy'] for i in xrange(iterations): b = a/a.max() print "Single column from compound dataset prior assignment: %.2fs" % (time.time() - start) start = time.time() a = f['eBeam-nonames'][:,1] for i in xrange(iterations): b = a/a.max() print "Single column dataset and prior assignment: %.2fs\n" % (time.time() - start) ####### # Select columns from 2x2 ####### start = time.time() for i in xrange(iterations/50): for row in f['dsSubset-names']['d']: a = row/row.max() print "Columns as compound dataset: %.2fs" % (time.time() - start) start = time.time() for i in xrange(iterations/50): for row in f['dsSubset-nonames'][:,3:103]: a = row/row.max() print "Columns as dataset '[:,3:103]': %.2fs" % (time.time() - start) start = time.time() for i in xrange(iterations/50): for row in f['dsSubset-nonames'][:,3:]: a = row/row.max() print "Columns as dataset '[:,3:]': %.2fs" % (time.time() - start) start = time.time() a = f['dsSubset-names']['d'] for i in xrange(iterations/50): for row in a: b = row/row.max() print "Columns as compound dataset and prior assignment: %.2fs" % (time.time() - start) start = time.time() a = f['dsSubset-nonames'][:,3:] for i in xrange(iterations/50): for row in a: b = row/row.max() print "Columns as dataset and prior assignment: %.2fs" % (time.time() - start) f.close()