SmallDataAna - Binned Data / Cube Production

Cube data - Overview

'Cube' is the nickname we gave to the intermediate analysis result which is a 2D detector image as a function of certain variables, meaning the first level of data reduction is achieved through average. The other variable is most typically the time tool resorted delay time, but can be other variables as well such as laser power, temperature, etc. The code techinally does not average, but rather sum the requested data for all events in a given bin passing some filter. The number of events in a slice is also stored.

The code is now rewritten to use the SmallDataAna(_psana) interface as that allows a more flexible definition of both binning variables and selection variables using derived variables rather than being restricted to the values saved directly in the hdf5 file. As for the SmallData production, there is a driver script (makeCube) and a "production" python file called "MakeCube.py". An example of the production file can be found here:

https://github.com/slac-lcls/smalldata_tools/blob/master/examples/MakeCube.py

The relevant lines are here:

ana.addCut('lightStatus/xray',0.5,1.5,'on')
ana.addCut('lightStatus/laser',0.5,1.5,'on')
ana.addCube('cube','delay',np.arange(13.,15.,0.05),'on')
cs140Dict = {'source':'cs140_rob','full':1}
ana.addToCube('cube',['ipm2/sum','ipm3/sum','diodeU/channels',cs140Dict])
anaps.makeCubeData('cube')

Cubing - Step-by-step

Now let us disect what this is doing:

ana.addCut('lightStatus/xray',0.5,1.5,'on')

ana.addCut('lightStatus/laser',0.5,1.5,'on')

We are defining an event selection called "on". At this point we only require there to be both laser and X-rays. Typically one would add requirement on the incoming intensity and, if interested in the timetool, some quality requirement on the time tool signal.

ana.addCube('cube','delay',np.arange(13.,15.,0.05),'on')

Here we are defining a cube called "cube": we give it a name (here "cube"), a variable we want to bin in (here "delay"), the bins we would like to use for the binning and lastly the name of the filter/event selection we defined previously (here "on").

cs140Dict = {'source':'cs140_rob','full':1}

ana.addToCube('cube',['ipm2/sum','ipm3/sum','diodeU/channels',cs140Dict])

Now we tell what data we would like to bin. You can either pass the names of variables in the littleData or the name of detectors in the "big" data. This is not being passed as a dictionary with the source name (the alias) and then information of what information you would like to add to the cube (main use case of the full data, passed as above asking 'full':1 as value pair - the 1 is unimportant, the code only checks the presence of the "full" key).

anaps.makeCubeData('cube')

At least we will now make the cube. Note that we are calling this on "anaps" (!). "ana" has the same function: this will only bin the data present in the smallData file (or the derived fields attached to the xarray), it will quietly ignore the variables only to be gotten from the xtc. Because this will get data from the xtc file, you will want to run this using mpi using the driver script, but checking the cube definition (correct definition of bins,....) can be done using "ana" interactively.

The cube name will be used to name the hdf5 file that will get written by the function. The "ana" function by default will NOT write a file and only return a dictionary with the binned data. It has a parameter that will make it write an hdf5 file. The "anaps" function will always write the hdf5 file as this is integral to how the events are distributed among cores and how the data is reassembled in the end.

Binning variables

The primary binning variable is defined in the cube definition. It either needs to be a variable in the smallData originally or an added variable. Using "delay" will create a derived variable for the X-ray-laser delay using the scan variable (if applicable), the timetool and the fast delay stage encoder value. If the bins are not passed, the code will try to use the np.unique(scanValue) which will only work for "step" scans.

In addition to the primary binning variables, you can now add more binning variables to make a higher dimensional "cube". This can be done like this:

ana.add_BinVar(addBinInfo)

You can pass either a list like [varname, bins] or a dict with variables names as keys and bin boundaries as values.

Add variables from the smalldata hdf5 (ana)

You can add lists of variables in the smallData, wether they were present "originally or if they were added to the data. Now there are two ways to also bin droplet data: You can either save an image based on droplets/bin or make square arrays in x/y/adu for each bin. This is specified like this:

ana.addToCube('cube',['droplet:epix/droplets:image','droplet:epix_2/droplet:array])

Add variables from the xtc (images)

Data from the xtc file are added as dictionaries as described above. Option for the dictionary include:

full: save full detector data

image: if present, save image.

thresAdu: require pixels in to be added image to be above threshold

thresRms: -"-

common_mode: number identifying the common mode method

OnOff parameter in makeCubeData

makeCubeData takes an optional parameter onoff=[0/1/2]. It is two by default which makes that the filter will be applied as given. onoff=1 means select laser-on events. The requirement that the optical laser is on is added, nothing else is changed. The outputfile will end on <…>_on.h5 onoff=0 selected laser-off events. If present, the laser-on requirement is flipped and criteria involving the timetool will be dropped. All other criteria remain. The outputfiles will not end in <…>_on.h5

Production script

Analog to SmallDataProducer.py and smallDataRun, there is a script called MakeCube.py in the examples folder with a partnering cubeRun script. cubeRun has -h option to list all other options. Most options are an analog to smallDataRun, minor differences are:

-n <#> means that only the first # events are considered for each bin in the image. All data from the smallData will contain all data, so the cube is not self-consistent. This option is for testing only!

-d <dirname>: the smallData file is read from <dirname>. The cube files will also be written there. At this point, the cubes will always be written where the smallData is read from.

---old---

These files are typically named "CubeSetup_<somethingDescriptive>" and are passed along to a job submission script called "cubeRun" . cubeRun has a help function that explains the command line parameters, but they are very similar to littleDataRun, aside from the necessary "-c <CubeSetupFilename>". The following are some of the options that differ from littleDataRun. -m takes the common mode parameter: 5 means using the unbonded pixels and 1 uses the zero peak. "1" works better, but fails if ASICs have a lot of signal. The unbounded pixels always work. If we want to threshold the pixels in high gain mode, I would suggest 2.5 rms or 25 ADU as typically working values to start with.

-s <size>: rebin image to size x size pixels

-m <common mode parmaeter>: apply common mode

-t <thres in ADU>: hitfinder

-T <thres in Rms>: hitfinder

-R store raw cspad data (NOT image)

The hdf5 file also stores the pedestal and rms values. If the data is stored is "raw" format, then the big CsPad will have the shape of 32x185x388 instead of 1692x1691. The same is true for the pedestal and the rms. We also store the x/y values for each pixel.

Cube data format

By default, the data gets saved in an hdf5 file in /reg/d/psdm/<instrument>/<expname>/hdf5/smalldata

and the files have names like Cube_<expname>_Run<runnumber>_<cubename><optional _on/off>.h5

Advanced options (in progress)

Multi-dimensional binning

It is possible to do this.

Returning event-by-event data (from smallData)

It is possible to do this.

Getting data from laser-off images closest in time for each cube slice

For each 'on' event, we are selecting the closest <n> 'off' events. At this moment, they are also summed together, each event is NOT normalized by its own off-events.

Normalizing the image event-by-event

It is not yet possible to do this.