Page History
...
- Stefano is looking into the cusz performance issues. With LC-GPU gets 60GB/s with 4 streams with 1 segment, and 6GB/s. Two questions:
- why does performance scale better than the number of streams?
- why is 1-stream 1-segment cusz so much worse (0.6GB/s) than LC-GPU (6GB/s)?
- some possible reasons that were suggested: compiler options in spack/conda? timing calculation incorrect for LC? error in the splitting up of the data into single-segments?
- could look at the performance in the profiler, although this will underestimate the eventual performance because of profiler overhead.
- next priorities for Stefano: see if we can improve angular integration performance to 50GB/s without batching events (which we can do because the outputs are "separable" into events, but it adds complexity). Note that SZ compression with batches of events is NOT "separable". Another project is the peak-finding performance with peakfinder8 in pyFAI.
Jan. 13, 2025
- Valerio is going to move psana2 on s3df to spack in the next few weeks
- Ric has the "graphs" approach to kernel launching is working. Tracking down a tricky segfault after 300 events.
- Stefano working on streams. Having trouble reproducing previous compilation: LC is broken with spack (unhappy with flags). Getting advice from Gabriel and Valerio. Looks like old versions of compiler are being picked up (gcc4). Valerio and Gabriel provided guidance for how to fix that.
Overview
Content Tools