You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 59 Next »

"Data analysis == Piece of cake"

Sections in this tutorial

It is always a good idea for the people doing analysis to be able to look at their detector images and probe intensity values. Given that a typical LCLS experiment has millions of snapshots to choose from, it is also critical that you can quickly select images of interest and set regions of interest using masks. By the end of this tutorial, you will be able to browse images, jump to images of interest, generate masks, change the x,y,z positions of your detector, find peaks in your images and index crystal diffraction patterns.

Citation for psocake (and other psana-based programs):
@Article{Thayer2017,
author="Thayer, J. and Damiani, D. and Ford, C. and Dubrovin, M. and Gaponenko, I. and O'Grady, C. P. and Kroeger, W. and Pines, J. and Lane, T. J. and Salnikov, A. and Schneider, D. and Tookey, T. and Weaver, M. and Yoon, C. H. and Perazzo, A.",
title="Data systems for the Linac coherent light source",
journal="Advanced Structural and Chemical Imaging",
year="2017", month="Jan", day="14", volume="3", number="1", pages="3", issn="2198-0926",
doi="10.1186/s40679-016-0037-7", url="https://doi.org/10.1186/s40679-016-0037-7"}
 
@article{Damiani:zw5004, 
author = "Damiani, D. and Dubrovin, M. and Gaponenko, I. and Kroeger, W. and Lane, T. J. and Mitra, A. 
and O'Grady, C. P. and Salnikov, A. and Sanchez-Gonzalez, A. and Schneider, D. and Yoon, C. H.", 
title = "{Linac Coherent Light Source data analysis using {it psana}}", 
journal = "Journal of Applied Crystallography", 
year = "2016", volume = "49", number = "2", pages = "672--679", month = "Apr", 
doi = {10.1107/S1600576716004349}, url = {http://dx.doi.org/10.1107/S1600576716004349}, }

Starting psocake in SFX mode

If you are on a psana machine, set up your environment by adding these lines to .bashrc (or your start up script):

# PSANA
source /reg/g/psdm/etc/psconda.sh

# CrystFEL
source /reg/g/cfel/crystfel/crystfel-dev/setup-sh

# CCP4 (version may change)
source /reg/common/package/ccp4-7.1/bin/ccp4.setup-sh

# XDS
export PATH=/reg/common/package/XDS-INTEL64_Linux_x86_64:$PATH

Make sure you have the psana environment setup (psana python Setup) before starting this tutorial. Also, check out a kerberos ticket by typing “kinit” which is needed for communicating with the experiment e-Log.

Type "psocake" on your terminal to open up the GUI. For crystallography, we will need to open it in sfx mode (-m):

$ psocake -m sfx


1) There are four parameters required to uniquely identify an image at LCLS. Type the (1) experiment name, (2) run number, (3) detector name, and (4) event number in the Experiment Parameters panel.

For this tutorial, we will look at experiment cxitut13, run 10, detector DscCsPad, event 11.

2)  You can specify the experiment parameters as command line arguments in psocake using the psana-style experiment run string. This is the recommended way of starting psocake:
$ psocake exp=cxitut13:run=10 -d DscCsPad -n 11 -m sfx

Or you can also use the -e and -r arguments for the experiment and the run number:

$ psocake -e cxitut13 -r 10 -d DscCsPad -n 11 -m sfx

During the experiment, you have access to psfehhiprioq which allows you to run psocake from Fast Feedback (FFB) nodes. To do this, append -a ffb. Note that only psfehhiprioq has access to the data on FFB:

$ psocake -e cxitut13 -r 10 -d DscCsPad -n 11 -m sfx -a ffb
Note that available detector names will be printed on the terminal once you have typed in the experiment name and the run number. 
#######################################
# Available area detectors: 
# ('CxiDs1.0:Cspad.0', 'DscCsPad', '')
#######################################

CxiDs1.0:Cspad.0 is the full detector name. DscCsPad is the simpler DAQ alias. Psocake can understand both naming conventions.

To check psocake version:

$ psocake --version

Don’t worry if you don’t remember these arguments. You can view argument options using --help:

$ psocake --help

Psocake should have generated directories and files in the experiment directory. At LCLS, all experiments are stored here: /reg/d/psdm/<instrument>/<experiment>. Let's take a moment and check out our directory structure. Either open a new terminal (Remember to 'ssh psana') or use the current terminal ('Cntrl+z' to suspend psocake that is running then 'bg' to run psocake in the background), type the following command:

$ ls /reg/d/psdm/cxi/cxitut13
calib  ftc  hdf5  res  scratch  usr  xtc

calib: This is where all psana calibration is stored. Detector geometry, pedestals, gain, common mode constants, and bad pixelmap.

xtc: This is where all your raw data is stored. XTC is a simple and efficient format for storing large data. XTCs can be read using psana. Note you have 4 months to analyse your data before xtcs are moved off to tape.

scratch: This is where psocake saves all the files like .cxi and .stream. This directory is not backed up, so important files need to be move to /res.

res: This is the results directory which is backed up on tape. After completing your analysis, your results/data should be moved here.

Mask making

In this section, let's learn how to mask out pixels that should not be used for analysis (such as dead pixels), mask out the jet streak at the centre of the detector, and mask out the water ring (just for fun!).

Note: the Image Panel must be in the default "greyscale" colormap for the mask colors to display properly.

1) In the mask panel, click on "Use psana mask". This will mask out the following pixels that should not be used for analysis; calib, status, edge, central, unbonded pixels, unbonded pixel neighbor pixels. These masked pixels are shown as green on the image panel.

2) On the mask panel, click on "Use streak mask". This will mask out strong intensities originating from the edges of the central asics. The streak mask varies shot-to-shot.

3) To make a donut mask over the water ring, click on "Use user-defined mask". This will bring up a cyan circle, cyan polygon and cyan square mask generators.

Select "Toggle" in Masking mode. Move the cyan circle to the centre of the detector by dragging the circle. Resize the cyan circle by dragging the diamond on the perimeter. Once you are happy with the position, click "mask circular ROI" button on the mask panel.

Increase the cyan circle again by dragging the diamond on the perimeter. Click "mask circular ROI" button on the mask panel. Because we are in the "toggle" mode, the previous mask gets toggled and disappears. The area that does not overlap with the previous mask get masked out.

To save the user-defined mask, click on "Save static mask" on the mask panel which will save the mask in the scratch folder. This will combine the green and blue masks into a single mask. For this example, your mask will be saved here:

/reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/mask.npy (unassembled 3D ndarray)

/reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/mask.txt (2D text)

You can load the user-defined mask using the "Load mask" button and selecting mask.npy.

mask.txt is compatible with the calibration manager application, calibman.


To delete the mask on the screen, select "Unmask" under Masking mode. Drag a blue circle mask generator over the detector and click "Stamp circular mask".

Peak finding

In this section, we will find peaks on the detector image. To find the peaks on the image, set the "Algorithm" to “Droplet” in the Peak Finder panel. Details of the peak finding algorithm is given here: Hit and Peak Finding Algorithms#Twothreshold"Dropletfinder". You should notice peaks being highlights in the Image panel.

Let's define the area of the peaks that you are looking for. Examine the peaks found by zooming in/out with the middle mouse scroll (or a two finger slide on a Mac touchpad). Notice that the Bragg peaks are composed of 2 to 8 connected pixels. Setting the radius to 2 sets a 5x5 cyan boundary around the Bragg peaks (radius x 2+1 = 5). Change the following values in the Peak Finder panel.
  • Set npix_min: 2
  • Set npix_max: 8
  • radius: 2

The two threshold droplet algorithm requires us to set two thresholds; a low and a high. The low threshold is the background noise level. The high threshold is the minimum height of our peaks of interest. Move the green square over the water ring to study the level of background noise in the ROI Histogram panel. Notice that the water ring intensities can go up to 120 ADUs. Strong Bragg spots are at least 250 ADUs. Ideally, we want to reject peaks that have low signal to noise ratios. Let's set it to 7 sigma, typically you need to stay above 4. So set the following values.
  • thr_low: 120
  • thr_high: 250
  • son_min: 7

Hover the mouse pointer over the Bragg peaks to study the intensities. The sum of the Bragg peak pixels are above 500 ADUs. Set the following values:

  • Set atot_thr: 500

If you have trouble finding peaks due to high background scattering, you should turn on median background subtraction. In the Experiment Parameters panel, set the Image properties to "median background corrected ADU".
Let's launch the peak finder on the cluster for a small number of events to see how this set of parameters performs. The output directory on the Peak Finder panel should already be automatically set to: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake
Since we are analyzing run 10, /r0010 directory will be generated under the output directory.
The default setting will analyze run 10 on psanaq with 24 CPUs. Number of events to process set to -1 analyzes all the events.
For this demo, let's analyze only 50 events with 2 CPUs.
  • Run(s): 10
  • Queue: psanaq
  • CPUs: 2
  • Number of events to process: 50

To check which queues are free, type the following in the terminal:


$ bqueues | grep ps
psnehhiprioq    117  Open:Active       -    -    -    -     0     0     0     0
psfehhiprioq    117  Open:Active       -    -    -    -     0     0     0     0
psnehprioq      116  Open:Active       -    -    -    -     0     0     0     0
psfehprioq      116  Open:Active       -    -    -    -     0     0     0     0
psnehq          115  Open:Active       -    -    -    -     0     0     0     0
psfehq          115  Open:Active       -    -    -    -   520   276   244     0
psdebugq        114  Open:Active       -    -    -    -     0     0     0     0
psanaq          112  Open:Active       -    -    -    -  7466  6612   854     0
psnehidleq       70  Open:Active       -    -    -    -    96     0    96     0
psfehidleq       70  Open:Active       -    -    -    -   104     0     0   104
psanaidleq       70  Open:Active       -    -    -    -   552   432     0   120

At the time of writing this documentation, psanaq is quite busy processing 7466 jobs. 6612 jobs are pending. 854 jobs are currently running. For more information on which queue you are allowed to use, see Batch System Analysis Jobs

Click "Launch peak finder”. You can check the status of launched jobs by typing "bjobs" in your Terminal. This will display the job ID and the status amongst other information.
A logfile of the peak finding is also saved under the same directory, .<jobID>.log. Note the "." in front of the logfile name.
If the status stays in PEND or SUSP mode for awhile, then you may want to kill the jobs. To kill a batch job, type "bkill <jobID>". For more information, please refer to: Submitting Batch Jobs
You can check the status of your peak finding job here: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/status_peaks.txt
Also, click "Refresh" in Small Data panel to see the progress. You should see a red plot showing event number vs number of peaks found. When peak finding finishes, the plot will turn blue.
Note: You can launch peak finding jobs on multiple runs when you are ready to analyze the entire experiment, e.g. Run(s) 1,10:13 will launch peak finding on runs 1, 10, 11, and 13.
The CXIDB file is generated in the output directory in the following format: <experiment name>_<4-digit run number>.cxi (i.e., /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/cxitut13_0010.cxi) 

Two virtual powder patterns will also be generated in the run directory:
1) cxitut13_0010_maxHits.npy: Maximum pixel values for hits found
2) cxitut13_0010_maxMisses.npy: Maximum pixel values for misses
If the hit finding parameters are good, cxitut13_0010_maxMisses.npy should not contain many Bragg spots. You can look at these images by using the "Load image" button in the Image Control panel.
At CXI, the detector geometry may not be centred exactly. You can fix the centre by clicking "Deploy automatically centred geometry" in the Diffraction geometry panel. This uses cxitut13_0010_maxHits.npy and cxitut13_0010_maxMisses.npy.

Jumping to interesting images based on the number of peaks

Once you have submitted the peak finder job, let's plot the number of peaks found for each event by clicking the "Refresh" button. The refresh button will sometimes not work if your .cxi file is busy writing.

In the small data panel, you should see the CXIDB filename:

  • filename: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/cxitut13_0010.cxi
  • dataset: /entry_1/result_1/nPeaksAll
"/entry_1/result_1/nPeaksAll" is an array containing number of peaks found for each event.
This should load a plot of all the peaks found so far per event. Click "Refresh" to update the plot even if your batch job is still running.
You can click on the red marker in the plot to jump to the corresponding events. These single crystal diffraction patterns of lysozyme seem to have between 15 to 60 peaks.

Indexing crystals

First things first, crystal indexing requires an accurate detector geometry. Latest CXI geometry files can be found here: Geometry history

Detector panels can manually adjusted using: calibman


Secondly, we need to know the diffraction geometry before we can index. It is often the case at the beginning of a beamtime that the detector distance to the interaction point (coffset) is not precisely known (We are talking about sub-millimetre precision), and we can use the Diffraction Geometry panel to find this distance.
  • In the diffraction geometry panel, set "Detector distance" to 156mm. 
  • In the indexing panel, tick "Indexing on". If indexing succeeds, the integrated (predicted) peaks will be marked with magenta circles. These triple rings represent the integration radius. A magenta triangle means wait. If indexing fails, you will see a magenta X.
  • If you see magenta circles and magenta unitcell appear, congratulations! You have indexed your first pattern using psocake.
  • Try incrementing this distance in 1mm step till the unit cell parameters are as close as possible to lys.cell. The optimum detector distance is around 158mm.
    Everytime the "Detector distance" value is changed by the user, psocake converts the psana geometry (in /reg/d/psdm/cxi/cxitut13/calib/CsPad\:\:CalibV1/CxiDs1.0\:Cspad.0/geometry/10-11.data) to a CrystFEL geom file (in /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/<runNumber>/.temp.geom).

Indexing panel uses CrystFEL to index the diffraction patterns, so the input parameters in the indexing panel should be familiar to you if you've used indexamajig before.

CrystFEL geometry: This geometry file is automatically converted from our psana geometry to CrystFEL geometry for you. Feel free to look inside .temp.geom. If you have a CrystFEL geometry file that you know is good, you can simply type it in. Psocake will never modify this file even if you change the "detector distance" in the diffraction geometry panel. (Just don't name your geometry .temp.geom, it will get overridden). You can also deploy the CrystFEL geometry as a psana geometry by clicking "Deploy CrystFEL geometry" in the indexing panel.

Integration radii: These 3 numbers define the radius of two concentric rings about each Bragg spot. Inner ring is used to integrate the Bragg spot and the outer ring is used to estimate the background. Try adjusting these numbers and see what is being integrated on screen. It should be large enough to fit a Bragg spot inside the inner ring.

PDB: If you have a CrystFEL unitcell, you can constrain the indexing algorithms to look for this unit cell.

Indexing method: Default is mosflm-noretry, dirax. "retry" is used to speed up mosflm (it can take few seconds).

Tolerance: These 4 numbers define how much wriggle room you want for indexing. 5, 5, 5 are the tolerance level for unitcell axes a, b, c. 1.5 is the tolerance level for the angles alpha, beta, gamma.

Extra CrystFEL parameters: You can enter extra parameters for indexamajig in this field. It will be appended at the end of the command line, e.g. --profile will turn on the processing timing information.

Let's try to index another diffraction pattern at event 44.

  • In the experiment parameters panel, set Event Number to 44.
  • You should see the magenta triangle appear again. Wait few seconds and hopefully you will have indexed another pattern.

Hopefully, you have indexed this diffraction pattern. Notice that the unitcell parameters are a bit off compared to what is expected. Let's load a CrystFEL unitcell file to help the indexer along.

  • In the Indexing panel, set the PDB field to: /reg/d/psdm/cxi/cxitut13/scratch/psocake/lys.cell
CrystFEL unit cell file version 1.0
lattice_type = tetragonal
centering = P
unique_axis = c
a = 77.05 A
b = 77.05 A
c = 37.21 A
al = 90 deg
be = 90 deg
ga = 90 deg
You should notice that the reindexed result will match the known unit cell parameters more closely.
Once you are happy with the detector geometry and indexing parameters, you can click "Launch indexing”.
  • Run(s): 10
  • Sample name: lysozyme
  • Queue: psanaq
  • CPUs: 24
  • Keep CXI images: On

Indexing will take some time to complete. If successful, you should see a stream file in: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/cxitut13_0010.stream

You can check the status of your indexing job here: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/status_index.txt

Psocake saves the detector images of only the hits in the .cxi file. It is likely that you may want to reindex these files to optimize the indexing rate. If you anticipate that you have finalized the indexing parameters, set 'Keep CXI images' to Off. It will delete the detector images in your .cxi file which will free up your precious disk space for doing other things.

As with peak finding, you can launch indexing jobs on multiple runs by specifying runs in the Run(s) field.

Indexing multiple lattices

CrystFEL can index multiple lattices per image. Set "Extra CrystFEL parameters" to --multi,--no-check-peaks. This will enable "subtract and retry" method.

Indexing pump-probe experiments

In a pump probe experiment, it is sometimes desirable to index only certain events, e.g index only the pumped crystals. This information is recorded in the EVR which psocake saves in the .cxi file.

So if you want to index only the hits with laser on (say EVR1: 182), then type the following in the "Index condition" field:

182 in #evr1#

Psocake will also accept combinations using AND/OR:

182 in #evr1# and 173 in #evr1# 

You can attach a tag to the stream filename by using the "Tag" field, e.g. evr182 would produce cxitut13_0010_evr182.stream.

Detector centering

Let's check whether your detector is well centered with respect to your beam. You want the centre to be as accurate as possible (at least to a pixel accuracy) for high indexing rates (>40%).

Load the powder rings generated by clicking the "Load image" button in the Image Control panel. Open "cxitut13_0010_maxHits.npy". Adjust the intensity as necessary.

Draw resolution rings by ticking "Resolution rings" in the Diffraction Geometry panel. You can change the ring resolution by typing number in "Resolution (pixels)". Type 165 and see whether your powder rings overlap with the resolution rings. If they do, then the detector is centered. If not, then you can click on the "Deploy automatically centred geometry" to recenter your detector. If you are unhappy with the results, you can use "Deploy manually centred geometry" which will shift the detector centre to the centre of the green ROI circle.

Since we are at run10, the newly deployed geometry file is named 10-end.data. If there already exists a geometry file with the same name, it will be renamed to 10-end.data-<timeModified>


Jumping to indexed images

Once you have completed indexing a run, you may want to look at the indexed events.

In the small data panel, type the CXIDB filename:

  • filename: /reg/d/psdm/cxi/cxitut13/scratch/<username>/psocake/r0010/cxitut13_0010.cxi
  • dataset: /entry_1/result_1/index
You can click on the red marker in the plot to jump to the corresponding events.
Once you are satisfied with indexing all your runs, please remember to backup your .cxi files in the /res directory of your experiment. The scratch folder will get wiped after few months (Data Retention Policy).
The stream file is small enough to transfer back to your institute for post-processing (CrystFEL tutorial) and phase retrieval (PHENIX).
For phase retrieval, you can use Phenix and CCP4 by sourcing the following lines:


# Phenix
source /reg/common/package/phenix/phenix-1.10.1-2155/phenix_env.sh
# CCP4
source /reg/common/package/ccp4/ccp4-7.0/bin/ccp4.setup-sh
# XDS is available here
/reg/common/package/XDS-INTEL64_Linux_x86_64/

For viewing the electron density, use coot contained inside phenix.


Beam Parameters for Publications

You can use psana to retrieve EPICS variables that are used in SFX publications, such as the pulse energy and the number of photons per pulse.
Common EPICS variables for SFX
ebeam = ebeamDet.get(evt)
pulseEnergy = ebeam.ebeamL3Energy()  # MeV

es = psana.DataSource.env().epicsStore()
calculatedNumberOfPhotons = get_es_value(es, 'SIOC:SYS0:ML00:AO580', NoneCheck=False, exceptReturn=0) * 1e12  # photons

Bug/Comments:

Please send bug reports/comments:

yoon82@slac.stanford.edu

Tiny url for this tutorial: http://tinyurl.com/zj4m23n

  • No labels