Page History

Mar. 24, 2017

Link to slides:

drp-users-mar-24-2017.pptx

* Cheetah has peak finder that is used a lot in crystallography
* Question of how to validate beam center finding algorithm and detector geometry
* bad pixels: after strong bragg peak the pixel might be temporarily damanged and should be masked out for consecutive shots until it recovers

## Reasons for rerunning data
* Anton Barty, Tom, Alexander (CFEL)
* poor detector calibration
* if results are not obvious to confirm if algorithm is working
* spectroscopy if detector corrections are required
* jet streaks for angular integration
* wrong event code triggereing, confirming that pump laser was running, backup diagnostic
* parameter tuning for example for peak finding
* hitfinding mostly works on first pass -> 10 times data reduction, indexing multiple times
* wrong hitfinding might bias physics
* if there is not enough data so hitfinding is important to really get all hits
* reprocessing data with new algorithms
* reliable gain maps for detectors (escpecially for high intensity), nonlinear detector response
* diffuse scattering requires whole image

* Peter Schwander
* veto signal for SPI for non hits, real time hitfinding
* convert data to single photons

* Filipe Maia
* detector artifacts
* signal influences gain for pnccd and cspad
* hitfinding can be diffcult for single protein SPI, because signal is very weak
* use ion tof or other hardware tools for hitfinding

* Mariano Trigo
* time tool calibration important for cube

* Ti-Yen
* halo close to beamcenter makes hitfinding diffcult for SPI
* converting ADU to photons is sufficient for SPI

* Aaron Brewster
* reprocessing because of unkown crystal unit cell
* unit cell can drift during the experiment, depending on sample preperation

* Peter Zwart
* hitrate can be quite high for imaging can be up to 80%
* clustering algorithm for hitfinding
* stable beam and sample delivery required
* difficulties in converting ADU to photons for pnccd, rounding errors
* check photon conversion on simulated data, understand errors in conversion

Extra detail from Anton Barty:

The question was: when do you of back over all the data.

1) To fix an error or artefact in the detector for which there was no ready-made correction prior to beamtime (or we did not know that error existed).

Examples may be: cspad common mode and kickback corrections, pnccd timing distorting the geometry, gain/intensity nonlinearities, timing tool edge finding needing careful attention

Corollary: For real-time analysis to work, detector output needs to 100% reliable

2) Where there are parameters to tweak in the analysis, no doubt they will want to be tweaked. This is particularly the case when there is unexpected signal, or no signal at all. No signal is hard because we have to convince ourselves that the analysis algorithms are not throwing out useful data.

Nadia: Going back over the data to get an extra 10% can improve data enough to get a result, as opposed to no result.

Corollary: Algorithms should not rely on adjustable parameters such as thresholds. If it’s adjustable you will want to see the effect of adjusting it, which means going back over the data.

Tom: An adjustable parameter you get one shot at setting is no longer an adjustable parameter.

3) Unexpected features in the data: including unexpected regions of interest or regions of integration, bad regions, stray reflections, integration directions, calibrations.

For example: shadows on the detector, stray light sneaking past or through apertures, unexpected parasitic scatter.

Corollary: Instant feedback is essential so the user can perfect these regions in real time. Expect to use some beamtime and sample to get this right.

4) Experimental SNAFUs. For example, primary sorting diagnostic not working and need to go to secondary diagnostic.

Example: Event code not recorded, have to look at Aquiris trace or a CCD camera to determine whether the pump laser was on or off.

Corollary: Once again, instant feedback is essential so the user can perfect these regions in real time. Expect to use some beamtime and sample to get this right. Someone must be there to be able to re-program this in real time. Software setup as important as sample delivery and beamline expertise.

One can make the following observations:

- If there is an adjustable parameter, users will want to see the effect of adjusting it. Move towards reliable algorithms that do not have user adjustable settings, then there is nothing to tweak.

- Setting up the software (e.g.: thresholds, calibration) becomes as critical a step as aligning the beam, moving apertures and mirrors, perfecting sample delivery.

- Actual beamtime and sample needs to be budgeted for setup of the online analysis with real sample. Real time analysis becomes a part of the instrument, not a step performed afterwards.

- Fast feedback so that regions can be adjusted in real time is essential. You can’t analyse blind.

- Accept some beamtime may be lost due to real time analysis problems, just as some beamtime is lost due to sample delivery or vacuum issues. Analysis equivalent of ‘hutch door open’.

- All analysis must be monitored and reprogrammed in real time. LCLS will have to understand a lot more about each experiment to be able to provide the necessary support in real time, at all hours. Record and figure it out later no longer possible.

Feb. 22, 2017

Link to slides:

Slides from Feb. 22, 2017

Berkeley:

Great goal x10 reduction

25GB/s is how many images per second? 2kHz for imaging detectors, up to 100kHz at some later date.

Crystallography is going to be hard to do compression. Crystallographers may accept fewer images per second. 2kHz is already enormous improvement.

Diffuse scattering is main science drivers.

Models to predict spots will be hard.

2-fold lossless compression debate at synchrotrons going on.

Reprocess the experiment because it is still research. We don’t agree with each other on what algorithms to use.

Lysozyme gives 2-3um accurate geometry.

In 2025, we may still have 1-50% hit rates.

What percentage of experiments will involve diffuse scattering? Still too early to tell.

Uppsala:

Multievent compression of 3x for any kind of data.

There are data centers interested in data reduction applications. Ian Foster (Argonne), Nick Sauter (Berkeley).

Detector artifacts (such as dynamic changing of gains) may make things difficult.

ADU gradients in cspad was difficult to deal with.

Chunking data would be necessary to achieve MPEG style compression.

Ion Time of Flight may help with hit finding, but liquid jet may have protein in it making ToF signal noisy.

CFEL:

How much money does it cost to saving everything to disk?

Storage costs: $74M out of $150M is about 50% of budget.

Can we allow the user to have all the data for few hours? Reading data from disk will be difficult.

Once data in on disk, people may think they own the data. It will be difficult to delete data at that point.

Identify which fraction of experiment take more disk space. Concentrate on optimizing data hungry experiment.

Multi-crystal images will help reduce number of images and use much fewer x-ray pulses. SFX analysis is improving allowing structure from only 1000 patterns.

CsPad geometry at MFX requires improvement.

Photon conversion is not trivial, but will help compression. A better detector needed.

Many users with some pressure from LCLS may be ready to throw away data on the fly.

Tom would like a week to tweak hit finding to get the most out of data.

Jana:

L3 trigger can be used to test ideas

Spreadsheet of data reduction

Spreadsheet of L3 filter results comparison with ultimate analysis

Talk to Dan Depont about future sample delivery

Talk to Gabriella about detector orientation/calibration

DAQ can not write more than 25GB/s, to expand this will increase cache cost

Page tree

Versions Compared

Old Version 7

New Version 8

Key

Mar. 24, 2017

Feb. 22, 2017