Page History

Mar. 24, 2017: "Reasons For Re-Processing Data"

Link to slides:

drp-users-mar-24-2017.pptx

* Cheetah has peak finder that is used a lot in crystallography
* Question of how to validate beam center finding algorithm and detector geometry
* bad pixels: after strong bragg peak the pixel might be temporarily damanged and should be masked out for consecutive shots until it recovers

## Reasons for rerunning data
* Anton Barty, Tom, Alexander (CFEL)
* poor detector calibration
* if results are not obvious to confirm if algorithm is working
* spectroscopy if detector corrections are required
* jet streaks for angular integration
* wrong event code triggereing, confirming that pump laser was running, backup diagnostic
* parameter tuning for example for peak finding
* hitfinding mostly works on first pass -> 10 times data reduction, indexing multiple times
* wrong hitfinding might bias physics
* if there is not enough data so hitfinding is important to really get all hits
* reprocessing data with new algorithms
* reliable gain maps for detectors (escpecially for high intensity), nonlinear detector response
* diffuse scattering requires whole image

* Peter Schwander
* veto signal for SPI for non hits, real time hitfinding
* convert data to single photons

* Filipe Maia
* detector artifacts
* signal influences gain for pnccd and cspad
* hitfinding can be diffcult for single protein SPI, because signal is very weak
* use ion tof or other hardware tools for hitfinding

* Mariano Trigo
* time tool calibration important for cube

* Ti-Yen
* halo close to beamcenter makes hitfinding diffcult for SPI
* converting ADU to photons is sufficient for SPI

* Aaron Brewster
* reprocessing because of unkown crystal unit cell
* unit cell can drift during the experiment, depending on sample preperation

* Peter Zwart
* hitrate can be quite high for imaging can be up to 80%
* clustering algorithm for hitfinding
* stable beam and sample delivery required
* difficulties in converting ADU to photons for pnccd, rounding errors
* check photon conversion on simulated data, understand errors in conversion

Extra detail from Anton Barty:

The question was: when do you of back over all the data.

1) To fix an error or artefact in the detector for which there was no ready-made correction prior to beamtime (or we did not know that error existed).

Examples may be: cspad common mode and kickback corrections, pnccd timing distorting the geometry, gain/intensity nonlinearities, timing tool edge finding needing careful attention

Corollary: For real-time analysis to work, detector output needs to 100% reliable

2) Where there are parameters to tweak in the analysis, no doubt they will want to be tweaked. This is particularly the case when there is unexpected signal, or no signal at all. No signal is hard because we have to convince ourselves that the analysis algorithms are not throwing out useful data.

Nadia: Going back over the data to get an extra 10% can improve data enough to get a result, as opposed to no result.

Corollary: Algorithms should not rely on adjustable parameters such as thresholds. If it’s adjustable you will want to see the effect of adjusting it, which means going back over the data.

Tom: An adjustable parameter you get one shot at setting is no longer an adjustable parameter.

3) Unexpected features in the data: including unexpected regions of interest or regions of integration, bad regions, stray reflections, integration directions, calibrations.

For example: shadows on the detector, stray light sneaking past or through apertures, unexpected parasitic scatter.

Corollary: Instant feedback is essential so the user can perfect these regions in real time. Expect to use some beamtime and sample to get this right.

4) Experimental SNAFUs. For example, primary sorting diagnostic not working and need to go to secondary diagnostic.

Example: Event code not recorded, have to look at Aquiris trace or a CCD camera to determine whether the pump laser was on or off.

Corollary: Once again, instant feedback is essential so the user can perfect these regions in real time. Expect to use some beamtime and sample to get this right. Someone must be there to be able to re-program this in real time. Software setup as important as sample delivery and beamline expertise.

One can make the following observations:

- If there is an adjustable parameter, users will want to see the effect of adjusting it. Move towards reliable algorithms that do not have user adjustable settings, then there is nothing to tweak.

- Setting up the software (e.g.: thresholds, calibration) becomes as critical a step as aligning the beam, moving apertures and mirrors, perfecting sample delivery.

- Actual beamtime and sample needs to be budgeted for setup of the online analysis with real sample. Real time analysis becomes a part of the instrument, not a step performed afterwards.

- Fast feedback so that regions can be adjusted in real time is essential. You can’t analyse blind.

- Accept some beamtime may be lost due to real time analysis problems, just as some beamtime is lost due to sample delivery or vacuum issues. Analysis equivalent of ‘hutch door open’.

- All analysis must be monitored and reprogrammed in real time. LCLS will have to understand a lot more about each experiment to be able to provide the necessary support in real time, at all hours. Record and figure it out later no longer possible.

Thoughts from Tim van Driel:

The full data analysis still benefits some data analysis as the higher dimensionality of the data makes it easier to extract non-linear behavior and correlated fluctuations.
If we are careful with measurement, and the diagnostic tools perform adequately we can instead rely on littledata and cube. To fully rely on littledata and cube, we would require both for future experiments as they are sensitive/insensitive to different types of errors.
When going from Full data analysis to cube/littledata the same corrections are needed all in all, but the necessity differs from littledata to cube processing.
If new detectors behave less ideally than the CSPAD does now, we are back to needing the full datasets to develop the necessary filtering and corrections.

A quick note regarding radial integration: All pump-probe diffuse scattering experiments have anisotropy at early times (usually <10ps, but can be up to ns) it may be negligible if the solute signal is relatively large as for protein crystallography. The anisotropy can be separated using legendre filters of different order but is probably easiest to do on the fly by integrating the data along phi and theta. I would use at least 17 bins which makes the assumption of 1e3 reduction for diffuse scattering 1e2 instead.

Reasons for reprocessing the data:
Full data analysis (used on experiments before 2016)
- Detector calibration
- Detector geometry
- Common mode subtraction
- Sample-detector distance
- Correlated behavior, outliers non-linear corrections
- Time-tool calibration
- Masking
- Binning
- Experimental detector corrections (solid angle coverage, polarization, jet geometry, sample composition)

Littledata (used from 2016)
- Detector calibration
- Detector geometry
- Common mode subtraction
- Sample-detector distance
- Correlated behavior, outliers non-linear corrections
- Masking
- Experimental detector corrections (solid angle coverage, polarization, jet geometry, sample composition)

Cube (used from 2016)
- Detector calibration
- Common mode subtraction
- Correlated behavior, outliers non-linear corrections
- Time-tool calibration
- Binning
- Outlier rejection (usually based on littledata analysis)

XES (dispersed spectral signal on small area detector)
CSPAD 140k (before 2016)
  - Detector calibration
  - Common mode subtraction
  - filtering based on XDS
  - pixel-by-pixel analysis to separate 1-photon peak from noise, the choice of algorithm depends on the signal strength on the detector
  - masking
EPIX
  - Detector calibration
  - Dropletizing parameters
  - Droplet output

XAS (0d signal on a diode or on a small area detector)
  - Detector calibration
  - Common mode subtraction
  - masking

Feb. 22, 2017: "Introduction To Data Reduction"

Link to slides:

Slides from Feb. 22, 2017

Berkeley:

Great goal x10 reduction

...

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Mar. 24, 2017: "Reasons For Re-Processing Data"

Thoughts from Tim van Driel:

Feb. 22, 2017: "Introduction To Data Reduction"