To create a conda environment that contains pytorch and tensorflow we first need to determine what is the driver used by cuda.
Login to "ssh -X psana -l psrel"
and run "nvidia-smi"

The driver is v11.4 (top right)
Allow the proxy to see the external internet:
export HTTP_PROXY=http://psproxy:3128
export HTTPS_PROXY=http://psproxy:3128

Allow conda and deactivate the loaded environmet:

source /cds/sw/ds/ana/conda1/manage/bin/psconda.sh
conda deactivate


proceed in creating the new environment:

conda create --prefix /cds/sw/ds/ana/conda2/inst/envs/deeplearning8 -c conda-forge python=3.9
conda activate deeplearning8
conda install -c conda-forge --experimental-solver=libmamba python=3.9 tensorflow tensorflow-gpu keras matplotlib notebook pandas scipy scikit-learn pytorch=1.10.0="*cuda*" cudatoolkit=11.4

PS: Since we start without an environment (conda deactivate) the --experimental-solver does not work in the creation of the environment. This is the reason of the split up in two commands.

to test if the system sees the GPU, run python and:

import torch
torch.cuda.is_available()

import tensorflow as tf
tf.test.is_gpu_available()

both commands will report "True"

To import the new environment into Jupyter notebook we need to:

Exit from this ssh and re-login as ssh -X psbuild-rhel7-01 -l psreldev

make a copy of a prexisting kernel folder:

cp /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning5/ /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning8/

then open: /reg/g/psdm/sw/conda/jhub_config/prod-rhel7/kernels/deeplearning8/kernel.json and replace all deeplearning5 into deeplearning8

{
    "argv": [
        "/cds/sw/ds/ana/conda2/inst/envs/deeplearning8/bin/python",
        "-m",
        "ipykernel",
        "-f",
        "{connection_file}"
    ],
    "display_name": "Deeplearning8 py3",
    "language": "python"
}
  • No labels