Page History
...
Experiment | Run | Comment |
---|---|---|
tmoc00118 | 222 | Generic TMO dark data |
tmoacr019 | 4,5,6 | xtcav dark,lasing-off,lasing-on (cpo thinks, not certain) |
rixx43518 | 34 | A DAQ "fly scan" of motor (see ami#FlyScan:MeanVs.ScanValue) |
rixx43518 | 45 | A DAQ "step scan" of two motors |
rixl1013320 | 63 | Rix stepping delay scan of both Vitara delay and ATM delay stage (lxt_ttc scan) at a single mono energy |
rixl1013320 | 93 | Rix continuous mono scan with laser on/off data at a single time delay |
rixx1003821 | 55 | An infinite sequence with two slow andors running at different rates |
rixx1003821 | 68 | A finite burst sequence with one andor |
uedcom103 | 7 | epix10ka data |
ueddaq02 | 569 | epix10ka data |
...
psana can scale to allow for high rate analysis. For example, many hdf5 files of small user-defined data (described above in Example Script Producing Small HDF5 File) can be written, one per "SRV" node in the diagram below. The total number of SRV nodes is defined by the environment variable PS_SRV_NODES (defaults to 0). These many hdf5 files are joined by psana into what appears to be one file using the hdf5 "virtual dataset" feature. Similarly, multiple nodes can be used for filtering ("EB" nodes in the diagram below) and multiple nodes can be used to process big data in the main psana event loop ("BD" nodes in the digram below). The one piece that cannot be scaled (currently) to multiple nodes is the SMD0 (SMallData) task, which reads the timestamps and fseek offsets from each tiny .smd.xtc2 file produced by the DAQ (typically one per detector, or one per detector segment, although it can contain more than one segment or detector). This task joins together the relevant data for each shot ("event build") using the timestamp. This SMD0 task is multi-threaded, with one thread for each detector. For highest performance it is important that all SMD0 threads be allocated an entire MPI node.
Running a large job
Below shows how to setup a slurm job script to run a large job. This script uses setup_hosts_openmpi.sh
(also provided below) to assign a single node to SMD0 (see diagram above) and distribute all other tasks (EB, BD, & SRV) to the rest of available nodes. After source setup_hosts_openmpi.sh
, you can use $PS_N_RANKS and $PS_HOST_FILE in your mpirun command.
...
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash
#SBATCH --partition=milano
#SBATCH --job-name=run_large_psana2
#SBATCH --output=output-%j.txt
#SBATCH --error=output-%j.txt
#SBATCH --nodes=3
#SBATCH --exclusive
#SBATCH --time=10:00
# Configure psana2 parallelization
source setup_hosts_openmpi.sh
# Run your job with #ranks <= (#nodes - 1) * 120 + 1 or use $PS_N_RANKS
mpirun -np $PS_N_RANKS --hostfile $PS_HOST_FILE python test_mpi.py |
...
Code Block | ||||
---|---|---|---|---|
| ||||
############################################################ # First node must be exclusive to smd0 # * For openmpi, slots=1 must be assigned to the first node. ############################################################ # Get list of hosts by expand shorthand node list into a # line-by-line node list host_list=$(scontrol show hostnames $SLURM_JOB_NODELIST) hosts=($host_list) # Write out to host file by putting rank 0 on the first node host_file="slurm_host_${SLURM_JOB_ID}" for i in "${!hosts[@]}"; do if [[ "$i" == "0" ]]; then echo ${hosts[$i]} slots=1 > $host_file else echo ${hosts[$i]} >> $host_file fi done # Export hostfile for mpirun export PS_HOST_FILE=$host_file # Calculate no. of ranks available in the job. export PS_N_RANKS=$(( SLURM_CPUS_ON_NODE * ( SLURM_JOB_NUM_NODES - 1 ) + 1 )) |
...
You can submit your analysis job again (any increasing run numbers are always monitored). For old job (previously submitted run number from the same node and same psplot port), they will NOT be shown automatically (STATUS: RECEIVED). You can reactivate them using show(ID) command.
Code Block | ||||
---|---|---|---|---|
| ||||
(ps-4.6.3) sbatch submit_run_andor.sh rixc00121 121
Submitted batch job 43195858
(ps-4.6.3) sbatch submit_run_andor.sh rixc00121 122
Submitted batch job 43195859 | ||||
In [2]: ls()
ID SLURM_JOB_ID EXP RUN NODE PORT STATUS
1 3275205 rixc00121 121 sdfiana001.sdf.slac.stanford.edu 12323 RECEIVED
In [3]: show(1)
Main received {'msgtype': 3} from db-zmq-server |
To kill a plot, use kill(ID) or kill_all() to kill all plots. From psplot_live interactive session, you can list the plots:
Code Block | ||||
---|---|---|---|---|
| ||||
In [5]: kill(1) In [26]: ls() ID SLURM_JOB_ID EXP RUN NODE PORT STATUS |