Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: docs for creating a jupyter kernel running in singularity container #235

Open
jprorama opened this issue May 26, 2022 · 1 comment
Labels
fabric: cheaha Docs related to Cheaha platform feat: article New article or section request
Milestone

Comments

@jprorama
Copy link
Contributor

jprorama commented May 26, 2022

We should add some docs to help folks create jupyter kernels that run inside singularity containers. I ran into this need recently when trying to use pytorch geometric installed in a regular Anaconda3/2021.11 venv from inside my notebook.

The module appears to be compiled on a newer platform (ubunut1804) which has a newer glibc==2.27. This means when the modules try to run our our rhel7 compute nodes they throw a library error about glibc being too old. glibc is not easy to upgrade or provide alternate versions of so it's easier to satisfy the requirement from an environment inside a container.

NGC provides a pytorch container that is easy to use via Singularity. You can pull it down into the project directory where you'll access your notebook.

cd <project-dir>
singularity pull pytorch:22.04-py3.sif docker://nvcr.io/nvidia/pytorch:22.04-py3

You can run the container on a GPU node in the same directory like so:

singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash

Note: there is an issue trying to use nvidia-smi inside the container since it doesn't appear to inherit the LD_LIBRARY_PATH from the caller. This doesn't appear to affect operations below, but if i want to test nvidia-smi inside the container, set up my env as follows:

module load cuda11.4/toolkit
module load Singularity
export SINGULARITYENV_PATH=$PATH
export SINGULARITYENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH  # this doesn't seem to work
singularity run --nv -B /cm -B /data/user/$USER pytorch\:22.04-py3.sif /bin/bash
LD_LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64/:$LD_LIBRARY_PATH
nvidia-smil

Back to the kernel config...

The next step is to install a custom kernel for Juypter that starts the python kernel in the container. This is done by combining instructions from clemson iti, community docs and the ipython docs. A kernel is really just a json config that specifies the command to run for the kernel. We need a custom one that starts the container.

Make sure you have Anacoda loaded and preferred env loaded. This is mostly to provide good defaults and use the env that your calling jupyter notebook needs. (Don't know that this is strictly necessary.)

module load Anaconda3/2021.11
conda activate <myenv>

It's best to first create a template and then install the kernel. That way you always have the template and can reinstall or edit as needed outside the jupyter config dirs.

ipython kernel install --prefix ~/tmp --name singk --display-name "Python (singk)"

Then copy your custom kernel config file into your now kernel template file

cat > ~/tmp/share/jupyter/kernels/singk/kernel.json  << EOF
{
 "argv": [
  "singularity",
  "exec",
  "--nv",
  "-B",
  "/cm",
  "-B",
  "/data/user/$USER",
  "-B",
  "/data/user/home/$USER",
  "-e",
  "pytorch:22.04-py3.sif",
  "/home/$USER/.conda/envs/$CONDA_DEFAULT_ENV/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Python (singk)",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}
EOF

Notes:

  1. The $USER and $CONDA_DEFAULT_ENV variables are converted to strings during the cat command. If you edit the file directly make sure you use your actual values. You can't use variable expansions in the json.
  2. The parameters in the json file are like an exec() call. There is no command parsing so each string is specified as a separate quoted argument. Eg. use "-B", "/cm" not "-B /cm".

At this point your kernel is ready to install into the correct location for your user kernels.

jupyter kernelspec install ~/tmp/share/jupyter/kernels/singk --user

Now you can start a jupyter notebook in OOD on a GPU node and start the custom kernel for a new notebook.
Go to OOD and select Jupyter.
Add cuda, Singularity and Anaconda to your startup environment.

module load cuda11.4/toolkit
module load Singularity
module load Anaconda3/2021.11

Select a GPU partition like pascalnodes. Then launch the notebook job.

When you are in jupyter, navigate to the directory where you keep your notebooks and created the Singularity spif.
You can then select "Python (singk)" to launch a notebook with the Singularity container.

Note: On occasion, I've observed the container starting and then restarting after the first start . I don't know what causes this but it seems to heal itself. You can look in the OOD job's output.txt file to debug further.

@wwarriner wwarriner added the feat: article New article or section request label Sep 13, 2022
@wwarriner
Copy link
Contributor

This may be a helpful use case to add to our workflow_solutions/getting_containers.md page. I think @Premas would be the resident expert to decide here.

@wwarriner wwarriner added the fabric: cheaha Docs related to Cheaha platform label Dec 2, 2022
@iam4tune iam4tune self-assigned this Apr 9, 2024
@iam4tune iam4tune added this to the Sprint 24-08 milestone Apr 9, 2024
@iam4tune iam4tune modified the milestones: Sprint 24-08, Sprint 24-09 Apr 23, 2024
@iam4tune iam4tune modified the milestones: Sprint 24-12, Sprint 24-13 Jul 9, 2024
@iam4tune iam4tune removed their assignment Jul 22, 2024
@wwarriner wwarriner modified the milestones: Sprint 24-13, Sprint 24-15 Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fabric: cheaha Docs related to Cheaha platform feat: article New article or section request
Projects
None yet
Development

No branches or pull requests

3 participants