Confluence will be unusable 23-July-2024 at 06:00 due to a Crowd upgrade.
Table of Contents |
---|
Note: this should only be necessary for expert developers. Send mail to pcds-ana-l if you have questions.
Apply for an account at this link:
http://www.nersc.gov/users/accounts/user-accounts/get-a-nersc-account/
Use the following information for the various fields:
(courtesy of Anton Barty)
Currently this is only possible for "early access users" who have accounts at NERSC.Data is
...
...
...
...
...
...
Example slurm batch-job script submitted with "sbatch <scriptname>" ("srun" is the cray-equivalent of "mpirun"). These examples can be found at https://github.com/monarin/psana-nersc.git in "psana1/submit.sh" and "psana1/run_nersc.sh":
Code Block |
---|
#!/bin/bash -l #SBATCH --p regularaccount=lcls #SBATCH --job-name=lcls-py2-root #SBATCH -N -nodes=1 #SBATCH -t 01:00-constraint=knl #SBATCH --time=00:15:00 #SBATCH -A lcls-image=docker:slaclcls/lcls-py2-root:latest #SBATCH --exclusive #SBATCH --image=docker:registry.services.nersc.gov/psana:ana-0.17.4a module load shifter cd $HOME/shifterqos=regular t_start=`date +%s` export PMI_MMAP_SYNC_WAIT_TIME=600 srun -n 32 68 -c 4 shifter ./myjob.cshrun_nersc.sh t_end=`date +%s` echo PSJobCompleted TotalElapsed $((t_end-t_start)) $t_start $t_end |
Where run_nersc.sh Where my job.csh looks like the usual psana-python command:
Code Block |
---|
#!/bin/tcsh source /reg/g/psdm/etc/ana_env.csh cd $HOME/shifter setenv/bash # activate psana environment source /img/conda.local/env.sh source activate psana_base # set location for experiment db and calib dir export SIT_DATA=$CONDA_PREFIX/data export SIT_PSDM_DATA=/global/cscratch1/sd/psdatmgr/data/psdm # prevent crash when running on one core export HDF5_USE_FILE_LOCKING=FALSE python mpiDatasource.py |
To run a shorter "interactive" session (very useful for debugging since you don't have to wait for a batch job to start after fixing each typo:
Code Block |
---|
monarin@cori02: salloc -C knl -N 1 -t 1:00:00 -q interactive -A lcls --image=docker:slaclcls/lcls-py2-root:latest
salloc: Pending job allocation 32421205
salloc: job 32421205 queued and waiting for resources
salloc: job 32421205 has been allocated resources
salloc: Granted job allocation 32421205
salloc: Waiting for resource configuration
salloc: Nodes nid02346 are ready for job
monarin@nid02346: srun -n 3 shifter ./run.sh
2
0
1
monarin@nid02346: cat run.sh
#!/bin/bash
source /img/conda.local/env.local
source activate psana_base
python test_mpi.py |
And another approach that gets you a prompt "inside" the shifter container's conda environment:
Code Block |
---|
(login to a cori login node, then execute this command which allocates a node for you to use for 1 hour) salloc -C knl -N 1 -t 1:00:00 -q interactive -A lcls --image=docker:slaclcls/lcls-py2-root:latest (once that command completes) shifter /bin/bash (get a shell in the shifter image) source /img/conda.local/env.sh (setup conda) source activate psana_base (activate the appropriate conda environment) export SIT_PSDM_DATA =/global/projectacscratch1/sd/projectdirspsdatmgr/lclsdata/psdm python psana_io_benchmark.py exp=cxig3614:run=81(psana_base) cpo@nid02387:~$ more ~/junk.py from psana import * dsource = MPIDataSource('exp=mfx11116:run=602:dir=/global/cscratch1/sd/psdatmgr/data/psdm/MFX/mfx11116/xtc:smd') det = Detector('Jungfrau1M') for nevt,evt in enumerate(dsource.events()): calib = det.calib(evt) if calib is None: print 'none' else: print calib.shape if nevt>5: break (psana_base) cpo@nid02387:~$ python junk.py (2, 512, 1024) (2, 512, 1024) (2, 512, 1024) (2, 512, 1024) (2, 512, 1024) (2, 512, 1024) (2, 512, 1024) (psana_base) cpo@nid02387:~$ |