JupyterHub

psana2 is also available in JupyterHub here in the kernel named "LCLS-II py3": https://pswww.slac.stanford.edu/jupyterhub/hub/home

NOTE:

Environment

To obtain the environment to run psana2, execute the following:

source /cds/sw/ds/ana/conda2/manage/bin/psconda.sh

Note that LCLS-II psana is not compatible with LCLS-I psana, so environments must activate one or the other, but not both.

Detector Names

Use this command to see non-epics detector names (see "Detector Interface" example below):

(ps-4.1.0) psanagpu101:lcls2$ detnames exp=tmoc00118,run=123
---------------------
Name      | Data Type
---------------------
epicsinfo | epicsinfo
timing    | raw      
hsd       | raw      
gmdstr0   | raw    
etc.

Use the same command with the "-e" option to see epics detector names (see "Detector Interface" example below). These are slowly varying variables (like temperatures) that are not strongly time correlated with the regular per-shot detector data:

(ps-4.1.0) psanagpu101:lcls2$ detnames -e exp=tmoc00118,run=123
---------------------------
Name            | Data Type
---------------------------
StaleFlags      | raw      
Keithley_Sum    | raw      
IM2K4_XrayPower | raw      
IM3K4_XrayPower | raw 
etc.

Using the Detector Interface

Standard (per-shot) detectors and the slower epics variables can be accessed as shown here using the names discovered with the commands above. You can use tab-completion in ipython or the jupyternotebook to explore what you can do with the various detector objects:

from psana import DataSource
ds = DataSource(exp='tmoc00118',run=123)
myrun = next(ds.runs())
opal = myrun.Detector('tmoopal')
epics_det = myrun.Detector('IM2K4_XrayPower')
for evt in myrun.events():
    img = opal.raw.image(evt)
    epics_val = epics_det(evt)
	# check for missing data
    if img is None or epics_val is None:
        print('none')
        continue
    print(img.shape,epics_val)

Example Script Producing Small HDF5 File

You can run this script with MPI: mpirun -n 6 python example.py

It also works on one core with: python example.py. See MPI rank/task diagram here.

This mechanism by defaults produces "aligned" datasets where missing values are padded (with NaN's for floats, and -99999 for integers). To create an unaligned dataset (without padding) prefix the name of the variable with "unaligned_".

NOTE: in addition to the hdf5 you specify as your output file ("my.h5" below) you will see other h5 files like "my_part0.h5", one for each of the cores specified in PS_SRV_NODES. The reason for this is that each of those cores writes its own my_partN.h5 file: for LCLS2 it will be important for performance to write many files. The "my.h5" file is actually quite small, and uses a new HDF5 feature called a "Virtual DataSet" (VDS) to join together the various my_partN.h5 files. Also note that events in my.h5 will not be in time order. If you copy the .h5 somewhere else, you need to copy all of them.

NOTE: In python, if you want to exit early you often use a "break" statement. When running psana-python with mpi parallelization, however, not all cores will see this statement, and the result will be that your job will hang at the end. To avoid this use the "max_events" keyword argument to DataSource. For example: "DataSource(exp='tmoc00118',run=123,max_events=10)".

from psana import DataSource
import numpy as np
import os

# OPTIONAL callback, usually used for realtime plots
# called back on each SRV node, for every smd.event() call below
def test_callback(data_dict):
    print(data_dict)

# this sets the number of h5 files to write. 1 is sufficient for 120Hz operation
# only needed if you are saving h5.
os.environ['PS_SRV_NODES']='1'
ds = DataSource(exp='tmoc00118', run=123, max_events=100)
# batch_size here specifies how often the dictionary of information
# is sent to the SRV nodes
smd = ds.smalldata(filename='my.h5', batch_size=5, callbacks=[test_callback])
run = next(ds.runs())

# necessary (instead of "None") since some ranks may not receive events
# and the smd.sum() below could fail
arrsum = np.zeros((2), dtype=np.int)
for i,evt in enumerate(run.events()):
    myones = np.ones_like(arrsum)
    smd.event(evt, myfloat=2.0, arrint=myones)
    arrsum += myones

if smd.summary:
    smd.sum(arrsum)
    smd.save_summary({'summary_array' : arrsum}, summary_int=1)
smd.done()

Running in Parallel

Instructions for submitting batch jobs to run in parallel are here: Batch System Analysis Jobs

MPI Task Structure

To allow for scaling, many hdf5 files are written, one per "SRV" node. The total number of SRV nodes is defined by the environment variable PS_SRV_NODES (defaults to 0). These many hdf5 files are joined by psana into what appears to be one file using the hdf5 "virtual dataset" feature.

Page tree

psana