Page History
...
Dec. 2, 2024
- Gabriel seeing 21-3GB2GB/s on ampere with libCuFile with some variability (writing a single file). Will try gpu004 (less variable usage of filesystem, and has IB).
- Having trouble installing libpressio with spack. working with Robert. Will continue to work on angular integration profiling if blocked on spack.
Dec. 9, 2024
- Stefano working with Robert on libpressio installation issues
- Gabriel putting cufile results on confluence. Tried gpu004, getting 1-2GB/s there as well. Can write with 4GB/s from GPU to pinned memory. Was straightforward to use cufile. Only way to configure it is with json (e.g. block size). Using Nvidia-provided gdsio application. Looking at cuda-graphs as well. First event to "record" the graph takes a long time (100's of microseconds). Don't think there is a way to record the graph across daq restarts. Will give cufile results to weka people when they are ready on confluence.
- Ric got the daq to work with transitions through the GPU. Still a bunch of stuff needed to support multi-fpga. Some important decisions need to be made. Matt is making changes to XpmDetector that may impact this (only in the short-term while we use the emulator, since XpmDetector is used . Current approach is to event-build the various FPGA streams (and eventually go to one file) but need to decide how to handle configure data, for example. How to break the wait if we need to reconfigure? Maybe use a spin-loop? cpo votes for a multi-person design brainstorming session to discuss the issues that Ric has found (has the additional advantage that it educates multiple people). Ric suggests: maybe run N drp's on a box (one per fpga) so we don't event-build multiple fpga's? Disadvantage: per-event data becomes smaller, so algorithms become less efficient unless we (temporarily) create batches of multiple events (we can do this for angular integration, but not for SZ since output is not "separable" into individual events). How do we handle differences between the XpmDetector (used for emulation) and the real epixUHR? Maybe have 1 drp process launch the 5 (or N) processes?
Overview
Content Tools