Detector Corrections

We compare running default and common mode (cmpars=(7,0,100) detector corrections for the epix10ka with numpy, cupy, and cunumeric using data from the uedcom103 experiment run 796. The following are results from running with 7 ranks (1 rank/gpu) and 11 ranks (2 ranks/gpu) on S3DF ampere nodes.

For common mode corrections both cupy and cunumeric do not implement masked arrays so masking was turned off.

The cupy results used a circular buffer of 2 streams alternating between images. This showed better performance than 1 stream, while using 4 streams resulted in worse performance.

cunumeric is currently missing np.flipupd, np.fliplr, and np.select and falls back to CPU numpy for these operations resulting in copies from GPU to CPU for each operation.

Additionally this loop is problematic for cunumeric performance: https://github.com/slac-lcls/lcls2/blob/cunumeric/psana/psana/detector/UtilsAreaDetector.py#L98

Confluence and Jira now require federated login. Read more.

Page tree

Detector Corrections

Default Corrections 7 ranks

Default Corrections 11 ranks

Common Mode Corrections cmpars=(7,0,100) 7 ranks - NO MASKING

Common Mode Corrections cmpars=(7,0,100) 11 ranks - NO MASKING

Confluence and Jira now require federated login. Read more.

Page tree

GPU Based Data Reduction

Detector Corrections

Default Corrections 7 ranks

Default Corrections 11 ranks

Common Mode Corrections cmpars=(7,0,100) 7 ranks - NO MASKING

Common Mode Corrections cmpars=(7,0,100) 11 ranks - NO MASKING