System: SRCF FFB
Problem Overview
Reading data in live-mode while files are being written can cause segmentation fault in psana2. This occurs when readers (bd cores) are as fast (or faster) than writers. This problem was reported to file-system vendor (current Weka) and helps are en-route. Observed behavior is depicted in the diagram below when one bd core reads a chunk of bytes into its memory but part of this chunk is zeroed out.
a.xtc2 offset: 1000 chunk size: 20
| x x x x x x x x x x x x x x x 0 0 0 0 0]
We expect to see data from the last five bytes but they appear to be zero. This can cause different failures in psana2 including segmentation fault e.g.:
*** /cds/home/m/monarin/lcls2/install/include/xtcdata/xtc/ShapesData.hh:355: incorrect TypeId 0 [drp-srcf-cmp048:209412] *** Process received signal *** [drp-srcf-cmp048:209412] Signal: Aborted (6) [drp-srcf-cmp048:209412] Signal code: (-6)
Steps to Reproduce The Problem
Both writers and readers have to be fast enough. It seems like we can not trigger the problem when the test run is small (a few xtc2 files) or no. of bd cores is not large enough. The setup below produces the problem.
DAQ
We use rix timing system (XPM 3) and all available lanes on drp-srcf-cmp029. (one for timing and 7 for test detectors). The full cnf file that was used in this test is available in the appendix below.
PSANA2
We use 396 cores on 9 srcf nodes (32 eventbuilder cores) to keep up with all the eight xtc2 files. Below shows python and bash/job scripts for slurm.
import time import os,sys from psana import DataSource import numpy as np import vals from mpi4py import MPI comm = MPI.COMM_WORLD size = comm.Get_size() rank = comm.Get_rank() def test_standard(): batch_size = 1000 max_events = 0 hutch='tst' exp=sys.argv[1] runno=int(sys.argv[2]) xtc_dir=f'/cds/data/drpsrcf/{hutch}/{exp}/xtc/' ds = DataSource(exp=exp, run=runno, batch_size=batch_size, max_events=max_events, dir=xtc_dir, live=True ) sendbuf = np.zeros(1, dtype='i') recvbuf = None if rank == 0: recvbuf = np.empty([size, 1], dtype='i') st = time.time() for run in ds.runs(): for nevt, evt in enumerate(run.events()): if nevt % 1000 == 0 and nevt > 0: en = time.time() print(f'RANK: {rank:4d} EVENTS: {nevt:10d} RATE: {(1000/(en-st))*1e-3:.2f}kHz', flush=True) st = time.time() sendbuf += 1 # Count total no. of events comm.Gather(sendbuf, recvbuf, root=0) if rank == 0: n_events = np.sum(recvbuf) else: n_events = None n_events = comm.bcast(n_events, root=0) return n_events if __name__ == "__main__": comm.Barrier() t0 = MPI.Wtime() n_events = test_standard() comm.Barrier() t1 = MPI.Wtime() if rank == 0: n_eb_nodes = int(os.environ.get('PS_EB_NODES', '1')) print(f'TOTAL TIME:{t1-t0:.2f}s #EB: {n_eb_nodes:3d} EVENTS:{n_events:10d} RATE:{(n_events/(t1-t0))*1e-6:.2f}MHz', flush=True)
#!/bin/bash #SBATCH --partition=anaq #SBATCH --job-name=psana2 #SBATCH --nodes=9 #SBATCH --ntasks=396 ##SBATCH --ntasks-per-node=50 #SBATCH --output=%j.log #SBATCH --exclusive t_start=`date +%s` source setup_hosts.sh echo SLURM_HOSTFILE $SLURM_HOSTFILE SLURM_NTASKS $SLURM_NTASKS export PS_EB_NODES=32 MAX_EVENTS=0 EXP="tstx00817" RUNNO=55 srun ./run_slac.sh $MAX_EVENTS $EXP $RUNNO t_end=`date +%s` echo PSJobCompleted TotalElapsed $((t_end-t_start))
*** /cds/home/m/monarin/lcls2/install/include/xtcdata/xtc/ShapesData.hh:355: incorrect TypeId 0 [drp-srcf-cmp048:209412] *** Process received signal *** [drp-srcf-cmp048:209412] Signal: Aborted (6) [drp-srcf-cmp048:209412] Signal code: (-6)