Jupyter is a web based analysis and coding environment. It supports multiple different programming languages, but is mostly centered around python development. The main advantage over standard IDEs is that it provide immediate code execution and inline graphics - ie you can interactively explore data.
SLAC implements JupyterHub at http://jupyter.slac.stanford.edu. This provides a central point of access to your jupyter environment using SLAC credentials and access to data stored on SLAC GPFS.
What is somewhat unique to SLAC's implementation of jupyter is that we wish to:
- provide an environment for different experiments to utilise our infrastructure with minimal hassle.
- provide a means for users to run their own jupyter environments (their analysis environment, with their down dependencies)
The current implementation utilises kubernetes and docker images to provide the above functionality.
Link to tutorials
Jupyter on Batch
In order to provide a consistent environment across the web-based jupyter and batch systems, our currently recommendation is to develop your jupyter environment in Docker and then convert the images to Singularity. From there, we can integrate into our modulefiles system directly in order to provide command line access (and hence batch).
# modulefiles relies on an ENV MODULEPATH $ export MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles:/afs/slac/package/singularity/modulefiles:/opt/modulefiles # list available modulesfiles $ module avail ------------------------------------------ /afs/slac/package/singularity/modulefiles ------------------------------------------ ... cdms-jupyterlab/1.6 slac-ml/20181002.0 slac-ml/20190606.1 ...
There is a specific module called slac-ml
that provides a prebaked Singularity image derived from the jupyterhub image.
When we load a module, it will override certain environments so that we can now use the 'application' defined in the modulefile:
# first we have nothing loaded $ which jupyter /usr/bin/which: no jupyter in (/afs/slac/g/scs/net/bin:/afs/slac/u/sf/ytl/sys/bin:/afs/slac/u/sf/ytl/sys//bin:/usr/afsws/bin:/usr/local/bin:/afs/slac/package/lsf/curr/amd64_rhel70/bin:/opt/hpc/gcc-4.8.5/openmpi-3.1.2/fftw-3.3.8/bin:/opt/hpc/gcc-4.8.5/openmpi-3.1.2/parallel-hdf5-1.10.4/bin:/opt/hpc/gcc-4.8.5/openmpi-3.1.2/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin) # then we load the module $ module load slac-ml/20190606.1 # now we see the jupyter exe in our path $ which jupyter /afs/slac/package/singularity/images/slac-ml/20190606.1/bin/jupyter
In order to run on batch, we will use this modulefile system and submit jobs by creating a text file (eg myscript.sh
) like
#!/bin/bash -l #BSUB -P jupyter #BSUB -J my_batch_job_name #BSUB -q slacgpu #BSUB -n 1 #BSUB -R "span[hosts=1]" #BSUB -W 72:00 #BSUB -B # setup env source /etc/profile.d/modules.sh export MODULEPATH=/usr/share/Modules/modulefiles:/opt/modulefiles:/afs/slac/package/singularity/modulefiles module purge module load PrgEnv-gcc module slac-ml/20190606.1 # run the notebook, executing all cells cd MY_DATA_DIRECTORY jupyter nbconvert --to notebook --inplace --execute mynotebook.ipynb
You can then submit the job to batch via
bsub < myscript.sh
and you can monitor the job with
bjob -l {jobid}
We recommend either using nbconvert or papermill to provide parameter access to notebooks.