JupyterHub
psana2 is also available in JupyterHub here in the kernel named "LCLS-II py3": https://pswww.slac.stanford.edu/jupyterhub/hub/home
Environment
To obtain the environment to run psana2, execute the following:
source /cds/sw/ds/ana/conda2/manage/bin/psconda.sh
Note that LCLS-II psana is not compatible with LCLS-I psana, so environments must activate one or the other, but not both.
Public Practice Data
Publicly accessible practice data is located in the directory /cds/data/psdm/prj/public01/xtc. Use of this data requires the additional "dir=" keyword to the DataSource object.
Experiment | Run | Comment |
---|---|---|
tmoc00118 | 222 | Generic TMO dark data |
rixx43518 | 34 | A DAQ "fly scan" of motor (see ami#FlyScan:MeanVs.ScanValue) |
rixx43518 | 45 | A DAQ "step scan" of two motors |
Detector Names
Use this command to see non-epics detector names (see "Detector Interface" example below):
(ps-4.1.0) psanagpu101:lcls2$ detnames exp=tmoc00118,run=222,dir=/cds/data/psdm/prj/public01/xtc --------------------- Name | Data Type --------------------- epicsinfo | epicsinfo timing | raw hsd | raw gmdstr0 | raw etc.
Use the same command with the "-e" option to see epics detector names (see "Detector Interface" example below). These are slowly varying variables (like temperatures) that are not strongly time correlated with the regular per-shot detector data:
(ps-4.1.0) psanagpu101:lcls2$ detnames -e exp=tmoc00118,run=123 --------------------------- Name | Data Type --------------------------- StaleFlags | raw Keithley_Sum | raw IM2K4_XrayPower | raw IM3K4_XrayPower | raw etc.
Using the Detector Interface
Standard (per-shot) detectors and the slower epics variables can be accessed as shown here using the names discovered with the commands above. You can use tab-completion in ipython or the jupyter notebook to explore what you can do with the various detector objects:
from psana import DataSource ds = DataSource(exp='tmoc00118', run=222, dir='/cds/data/psdm/prj/public01/xtc', max_events=10) myrun = next(ds.runs()) opal = myrun.Detector('tmoopal') epics_det = myrun.Detector('IM2K4_XrayPower') for evt in myrun.events(): img = opal.raw.image(evt) epics_val = epics_det(evt) # check for missing data if img is None or epics_val is None: print('none') continue print(img.shape,epics_val)
Example Script Producing Small HDF5 File
You can run this script with MPI: mpirun -n 6 python example.py
It also works on one core with: python example.py (useful for debugging). See MPI rank/task diagram here to understand what different mpi ranks are doing.
This mechanism by defaults produces "aligned" datasets where missing values are padded (with NaN's for floats, and -99999 for integers). To create an unaligned dataset (without padding) prefix the name of the variable with "unaligned_".
NOTE: in addition to the hdf5 you specify as your output file ("my.h5" below) you will see other h5 files like "my_part0.h5", one for each of the cores specified in PS_SRV_NODES. The reason for this is that each of those cores writes its own my_partN.h5 file: for LCLS2 it will be important for performance to write many files. The "my.h5" file is actually quite small, and uses a new HDF5 feature called a "Virtual DataSet" (VDS) to join together the various my_partN.h5 files. Also note that events in my.h5 will not be in time order. If you copy the .h5 somewhere else, you need to copy all of them.
NOTE: In python, if you want to exit early you often use a "break" statement. When running psana-python with mpi parallelization, however, not all cores will see this statement, and the result will be that your job will hang at the end. To avoid this use the "max_events" keyword argument to DataSource (see example below).
from psana import DataSource import numpy as np import os # OPTIONAL callback with "gathered" small data from all cores. # usually used for creating realtime plots when analyzing from # DAQ shared memory. Called back on each SRV node. def my_smalldata(data_dict): print(data_dict) # sets the number of h5 files to write. 1 is sufficient for 120Hz operation # optional: only needed if you are saving h5. os.environ['PS_SRV_NODES']='1' ds = DataSource(exp='tmoc00118', run=222, dir='/cds/data/psdm/prj/public01/xtc', max_events=10) # batch_size is optional. specifies how often the dictionary of small # user data is gathered smd = ds.smalldata(filename='mysmallh5.h5', batch_size=5, callbacks=[my_smalldata]) for run in ds.runs(): opal = run.Detector('tmo_opal1') ebeam = run.Detector('ebeam') runsum = 0 for evt in run.events(): img = opal.raw.image(evt) photonEnergy = ebeam.raw.ebeamPhotonEnergy(evt) # important: always check for missing data if img is None or photonEnergy is None: continue evtsum = np.sum(img) # pass either dictionary or kwargs smd.event(evt, evtsum=evtsum, photonEnergy=photonEnergy) runsum += evtsum # beware of datatypes when summing: can overflow # optional summary data for whole run if smd.summary: smd.sum(runsum) # sum across all mpi cores # pass either dictionary or kwargs smd.save_summary({'sum_over_run' : runsum}, summary_int=1) smd.done()
Full-Featured Example with Callbacks and Detector Selection
You can run this script with MPI the same way as shown in the previous example: mpirun -n 6 python example.py
These additional arguments for DataSource were added to this example
- detectors=['detname1', 'detname2',]
List of detectors to be read from the disk. If you only need a few detectors for analysis, list their names here. The reading process will be faster since unused detector data is not ready.
- filter= filter_fn
You can write a filter_fn(evt) callback which returns True or False to include or exclude the event from getting read from disk. Note that this can be used to select specific events by filtering on evt.timestamp.
- small_xtc=['detname1', 'detname2']
List of detectors to be used in filter_fn()
- destination=destination
You can write a destination(evt) callback with returns rank id that you want to send this event to.
from psana import DataSource import numpy as np import os # OPTIONAL callback with "gathered" small data from all cores. # usually used for creating realtime plots when analyzing from # DAQ shared memory. Called back on each SRV node. def my_smalldata(data_dict): print(data_dict) # Use this function to decide to keep/discard this event # If this detector is needed, make sure to define this # detector in as_smds argument for DataSource (see below) def filter_fn(evt): run = evt.run() step = run.step(evt) opal = run.Detector('tmo_opal1') img = opal.raw.image(evt) return True # Use this function to direct an event to process on a # particular 'rank'. This function should returns a rank # number between 1 and total no. of ranks - 3 (3 ranks are reserved). def destination(evt): # Note that run, step, and det can be accessed # the same way as shown in filter_fn n_bd_nodes = 3 # for mpirun -n 6, 3 ranks are reserved so there are 3 bd ranks left dest = (evt.timestamp % n_bd_nodes) + 1 return dest # sets the number of h5 files to write. 1 is sufficient for 120Hz operation # optional: only needed if you are saving h5. os.environ['PS_SRV_NODES']='1' ds = DataSource(exp='tmoc00118', run=222, dir='/cds/data/psdm/prj/public01/xtc', max_events = 10, detectors = ['tmo_opal1', 'ebeam'], # only reads these detectors (faster) filter = filter_fn, # filter_fn returns True/False small_xtc = ['tmo_opal1'], # detectors to be used in filter callback destination = destination) # returns rank no. (send this evt to this rank) # batch_size is optional. specifies how often the dictionary of small # user data is gathered smd = ds.smalldata(filename='mysmallh5.h5', batch_size=5, callbacks=[my_smalldata]) for run in ds.runs(): opal = run.Detector('tmo_opal1') ebeam = run.Detector('ebeam') runsum = 0 for evt in run.events(): img = opal.raw.image(evt) photonEnergy = ebeam.raw.ebeamPhotonEnergy(evt) if img is None or photonEnergy is None: continue evtsum = np.sum(img) # pass either dictionary or kwargs smd.event(evt, evtsum=evtsum, photonEnergy=photonEnergy) runsum += evtsum # beware of datatypes when summing: can overflow # optional summary data for whole run if smd.summary: smd.sum(runsum) # sum across all mpi cores # pass either dictionary or kwargs smd.save_summary({'sum_over_run' : runsum}, summary_int=1) smd.done()
Running in Parallel
Instructions for submitting batch jobs to run in parallel are here: Batch System Analysis Jobs
Analyzing Scans
(ps-4.1.4) psanagpu104:lcls2$ detnames -s exp=rixdaq18,run=17 -------------------------- Name | Data Type -------------------------- motor1 | raw motor2 | raw step_value | raw step_docstring | raw -------------------------- (ps-4.1.4) psanagpu104:lcls2$
from psana import DataSource ds = DataSource(exp='rixdaq18',run=17) myrun = next(ds.runs()) motor1 = myrun.Detector('motor1') motor2 = myrun.Detector('motor2') step_value = myrun.Detector('step_value') step_docstring = myrun.Detector('step_docstring') for step in myrun.steps(): print(motor1(step),motor2(step),step_value(step),step_docstring(step)) for evt in step.events(): pass
Running From Shared Memory
psana2 scripts can be run on shared memory. Look at the DAQ .cnf file to see what the name of the node is running the shared memory server. You can find the name of the shared memory (hutch name is typically used) either by looking on the .cnf file (the "-P" option to monReqServer executable) or doing a command like this:
drp-neh-cmp003:~$ ls /dev/shm/ PdsMonitorSharedMemory_tmo drp-neh-cmp003:~$
For this output, you would use "DataSource(shmem='tmo')".
When running with mpi there are some complexities propagating the environment to remote nodes: the way to address that is described in this link. The same parallelization model is used as for the production of the small hdf5 files described here. The typical pattern would be to use the small data callback to receive all the data in a dictionary gathered from all nodes, as shown in the example here.
smd = ds.smalldata(batch_size=5, callbacks=[my_smalldata])
It is also necessary to have one core reserved to do the gathering, so have a line like this
os.environ['PS_SRV_NODES']='1'
Typically psmon is used for publishing results to realtime plots in the callback: Visualization Tools.
MPI Task Structure
To allow for scaling, many hdf5 files are written, one per "SRV" node. The total number of SRV nodes is defined by the environment variable PS_SRV_NODES (defaults to 0). These many hdf5 files are joined by psana into what appears to be one file using the hdf5 "virtual dataset" feature.