To create a conda environment that contains pytorch and tensorflow we first need to determine what is the driver used by cuda.
Login to "ssh -X psana -l psrel"
and run "nvidia-smi"
The driver is v11.4 (top right)
Allow the proxy to see the external internet:
export HTTP_PROXY=http://psproxy:3128
export HTTPS_PROXY=http://psproxy:3128
Allow conda and deactivate the loaded environmet:
source /cds/sw/ds/ana/conda1/manage/bin/psconda.sh conda deactivate
proceed in creating the new environment:
conda create --prefix /cds/sw/ds/ana/conda2/inst/envs/deeplearning8 -c conda-forge python=3.9 conda activate deeplearning8 conda install -c conda-forge --experimental-solver=libmamba python=3.9 tensorflow tensorflow-gpu keras matplotlib notebook pandas scipy scikit-learn pytorch=1.10.0="*cuda*" cudatoolkit=11.4
PS: Since we start without an environment (conda deactivate) the --experimental-solver does not work in the creation of the environment. This is the reason of the split up in two commands.
to test if the system sees the GPU, run python and:
import torch torch.cuda.is_available() import tensorflow as tf tf.test.is_gpu_available()
both commands will report "True"
To import the new environment into Jupyter notebook we need to:
Exit from this ssh and re-login as ssh -X psbuild-rhel7-01 -l psreldev
make a copy of a prexisting kernel folder:
cp /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning5/ /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning8/
then open: /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning8/kernel.json and replace all deeplearning5 into deeplearning8
{ "argv": [ "/cds/sw/ds/ana/conda2/inst/envs/deeplearning8/bin/python", "-m", "ipykernel", "-f", "{connection_file}" ], "display_name": "Deeplearning8 py3", "language": "python" }