This page will describe the setup at S3DF that will be used starting in summer of '23.

Details can be found on Running at S3DF.

Overview

Many MEC user group rely on tiff files for each detector in each event/shot taken. The necessary jobs are setup in the elog/ARP and are typically automatically triggered. A fairly standard setup looks like this:

This setup is now using smalldata_tools, where the producer is given a few options

--full: this option will translate all  the data to hdf5 files. This results in a large hdf5 files and is generally not recommended, both for disk space reasons & because this places the burden for further analysis at later steps of the analysis chain

--image: this option will store the data of tiled detectors as a single 2-d image. This is generally not recommended as this can result in discontinuities of observed features when the detector pixels are not perfectly aligned in x & Y, We usually recommend using data in raw-data shape and use the x,y&z values for each pixel in further analysis

--tiff: in addition to the hdf5 files, each dataset in 2-d shape per event will be also stored as tiff file in the scratch directory.  This means this code should be run with --full --image options if you want all detectors as tiff files. If you are only interested in the VISAR tiff files, you can omit --image.

--cores 1: you need to run tiff-file production in single-core mode as the file name uses the index of event in a main event loop. The standard MPI setup happens under the hood where different events are passed out to loops over parts of the dataset, if we had each core write out tiff-files, they would be overwritten.

To avoid saving separate tiff files for each of the 6015 events of a standard pedestal run, we explicitly limit the number of shots to be translated to 20 by default (most MEC runs have 1 or 10 events). If you are able to work from hdf5 files directly, you could run the translation on multiple cores, speeding up the analysis. For runs with 10 or less events, this is usually not necessary. 

Reaching S3DF

To reach s3df, you can access 's3dflogin' via ssh (or from a NoMachine/nx server). To be able to read data/run code, you need to go to the 'psana' machine pool (ssh psana). Alternatively, you can use Jupyterlab in the OnDenand service (request the asana-pool of machines).

Copy data to your home institution (or laptop):

To copy data to your home institute, if you used to use psexport, you will need to switch to:

s3dfdtn.slac.stanford.edu

More information can be found on Downloading Data

Working directories

in S3DF, we have a single space for each experiment. The data can be read from two different locations: the ffb and the offline. The data is first available in the ffb and will then move towards the offline. The limited size of the ffb means that data will only be available there temporarily. As rule of thumb, use the offline if present.

The directory structure is as follows:

/sdf/data/lcls/ds/mec/<experiment>/
drwxrws---+ 1 psdatmgr ps-data  0 Mar  3  2022 calib
drwxr-s---+ 1 psdatmgr ps-data  0 Feb 23 14:51 hdf5
drwxrws---+ 1 psdatmgr ps-data  0 Feb 23 15:24 results
lrwxr-x---  1 psdatmgr xs      42 Feb 16 08:07 scratch -> /sdf/scratch/lcls/ds/mec/<experiment>/scratch
drwxrws---+ 1 psdatmgr ps-data  0 Feb 24  2022 stats
drwxr-s---+ 1 psdatmgr ps-data  0 Feb 16 15:23 xtc

The ffb is a mount of the filesystem in the DRPSRCF used already in the previous years:

/sdf/data/lcls/drpsrcf/ffb/mec/<experiment>/

Be aware that data will only kept there as long as there is space. It is unlikely to contain your all of the data and data may need to be cleaned off before the end of the ongoing shift.

The results folder should hosts most of users' code, notebooks, etc. 

For more dedicated data processing (covered later), the smalldata_tools working directory generally is

/sdf/data/lcls/ds/<hutch>/<expname>/results/smalldata_tools 

Jupyterhub

The Jupyterhub setup at S3DF is described here

Detector data - image versus raw data

Below is a picture of pedestal values of the first 10 events for 'Quad3'. The data is recorded in a 3-d shape of (4,......). For each of these pixels, we have x,y and z arrays for their relative center positions. It is possible to setup the geometry so that these x,y,and z values can be used with their absolute values, although we usually do not do that. 

When you now want to display this data on a 2-d screen, we assign each pixel and ix and y values. You can see that in this example, the data in the top left ASIC seems to be aligned while you can see that the bottom left ASIC data is rotated relative to that and this shows up as a discontinuity when placing the values on a grid. This discontinuity is what you can see in the tiff files. Most of the time, this does not matter a lot, but if you care to use the most precise data available, you should use the pixel data and their pixel-center positions rather than the assembled 2-d images.

One time processing versus psana use (standard smalldata workflow)

A typical pattern in hutches that take a data usually at 120 Hz rates is to process data once and use the results to do further analysis. Reducing the '3d' detector data into a q-phi space is such an example as are photon finding. Reading & calibrating the detector image can be CPU intensive as can be detailed photon finding. We typically recommend doing this step once, saving the results in an hdf5 file and do further analysis (e.g. normalizing with i0 detectors,...) on this files.







  • No labels