You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should add some docs to help folks create jupyter kernels that run inside singularity containers. I ran into this need recently when trying to use pytorch geometric installed in a regular Anaconda3/2021.11 venv from inside my notebook.
NGC provides a pytorch container that is easy to use via Singularity. You can pull it down into the project directory where you'll access your notebook.
cd <project-dir>
singularity pull pytorch:22.04-py3.sif docker://nvcr.io/nvidia/pytorch:22.04-py3
You can run the container on a GPU node in the same directory like so:
singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash
Note: there is an issue trying to use nvidia-smi inside the container since it doesn't appear to inherit the LD_LIBRARY_PATH from the caller. This doesn't appear to affect operations below, but if i want to test nvidia-smi inside the container, set up my env as follows:
module load cuda11.4/toolkit
module load Singularity
export SINGULARITYENV_PATH=$PATH
export SINGULARITYENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH # this doesn't seem to work
singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash
LD_LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64/:$LD_LIBRARY_PATH
nvidia-smil
Back to the kernel config...
The next step is to install a custom kernel for Juypter that starts the python kernel in the container. This is done by combining instructions from clemson iti, community docs and the ipython docs. A kernel is really just a json config that specifies the command to run for the kernel. We need a custom one that starts the container.
Make sure you have Anacoda loaded and preferred env loaded. This is mostly to provide good defaults and use the env that your calling jupyter notebook needs. (Don't know that this is strictly necessary.)
It's best to first create a template and then install the kernel. That way you always have the template and can reinstall or edit as needed outside the jupyter config dirs.
The $USER and $CONDA_DEFAULT_ENV variables are converted to strings during the cat command. If you edit the file directly make sure you use your actual values. You can't use variable expansions in the json.
The parameters in the json file are like an exec() call. There is no command parsing so each string is specified as a separate quoted argument. Eg. use "-B", "/cm" not "-B /cm".
At this point your kernel is ready to install into the correct location for your user kernels.
Now you can start a jupyter notebook in OOD on a GPU node and start the custom kernel for a new notebook.
Go to OOD and select Jupyter.
Add cuda, Singularity and Anaconda to your startup environment.
Select a GPU partition like pascalnodes. Then launch the notebook job.
When you are in jupyter, navigate to the directory where you keep your notebooks and created the Singularity spif.
You can then select "Python (singk)" to launch a notebook with the Singularity container.
Note: On occasion, I've observed the container starting and then restarting after the first start . I don't know what causes this but it seems to heal itself. You can look in the OOD job's output.txt file to debug further.
The text was updated successfully, but these errors were encountered:
This may be a helpful use case to add to our workflow_solutions/getting_containers.md page. I think @Premas would be the resident expert to decide here.
We should add some docs to help folks create jupyter kernels that run inside singularity containers. I ran into this need recently when trying to use pytorch geometric installed in a regular Anaconda3/2021.11 venv from inside my notebook.
The module appears to be compiled on a newer platform (ubunut1804) which has a newer glibc==2.27. This means when the modules try to run our our rhel7 compute nodes they throw a library error about glibc being too old. glibc is not easy to upgrade or provide alternate versions of so it's easier to satisfy the requirement from an environment inside a container.
NGC provides a pytorch container that is easy to use via Singularity. You can pull it down into the project directory where you'll access your notebook.
You can run the container on a GPU node in the same directory like so:
Note: there is an issue trying to use
nvidia-smi
inside the container since it doesn't appear to inherit the LD_LIBRARY_PATH from the caller. This doesn't appear to affect operations below, but if i want to test nvidia-smi inside the container, set up my env as follows:Back to the kernel config...
The next step is to install a custom kernel for Juypter that starts the python kernel in the container. This is done by combining instructions from clemson iti, community docs and the ipython docs. A kernel is really just a json config that specifies the command to run for the kernel. We need a custom one that starts the container.
Make sure you have Anacoda loaded and preferred env loaded. This is mostly to provide good defaults and use the env that your calling jupyter notebook needs. (Don't know that this is strictly necessary.)
It's best to first create a template and then install the kernel. That way you always have the template and can reinstall or edit as needed outside the jupyter config dirs.
Then copy your custom kernel config file into your now kernel template file
Notes:
"-B", "/cm"
not"-B /cm"
.At this point your kernel is ready to install into the correct location for your user kernels.
Now you can start a jupyter notebook in OOD on a GPU node and start the custom kernel for a new notebook.
Go to OOD and select Jupyter.
Add cuda, Singularity and Anaconda to your startup environment.
Select a GPU partition like
pascalnodes
. Then launch the notebook job.When you are in jupyter, navigate to the directory where you keep your notebooks and created the Singularity spif.
You can then select "Python (singk)" to launch a notebook with the Singularity container.
Note: On occasion, I've observed the container starting and then restarting after the first start . I don't know what causes this but it seems to heal itself. You can look in the OOD job's output.txt file to debug further.
The text was updated successfully, but these errors were encountered: