Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepOps Release 22.04 #1164

Merged
merged 1 commit into from
Apr 26, 2022
Merged

DeepOps Release 22.04 #1164

merged 1 commit into from
Apr 26, 2022

Conversation

ajdecon
Copy link
Collaborator

@ajdecon ajdecon commented Apr 26, 2022

DeepOps 22.04 Release Notes

Known Issues

  • Kubeflow deployment is currently broken due to incompatibility between current Kubeflow and Kubernetes 1.22. Kubeflow deployment will be updated to add support when Kubeflow releases 1.6.

General

  • Extensive improvements to automated testing with Jenkins, Ansible Molecule, and ansible-lint
  • Update MIG playbook to use the new nvidia-mig-manager systemd service
  • Updates to roles for nvidia-docker and GPU driver
  • Various bug fixes

Slurm

  • Enhanced NCCL tests for Slurm cluster validation
  • Make use of pam_slurm_adopt optional
  • Break out multiple sections in Slurm inventory file

Kubernetes

  • Update to Kubernetes 1.22.6
  • Update default container runtime from dockershim to containerd
  • Add support for NVIDIA Network Operator
  • Add support to deploy NVIDIA Deep Learning Examples on Kubernetes clusters
  • Update to GPU Operator 1.9

Changes

Bugs/Enhancements

Upgrade Steps

If you are upgrading to this version of DeepOps from a previous release you will need to follow the upgrade section of the Slurm or Kubernetes Deployment Guides. In addition to this, the ./scripts/setup.sh script must be re-run and any new variables in the config.example files should be added to the existing config. For a full diff from release 22.01 run git diff 22.01 22.04 -- config.example/. If you encounter problem please open a GitHub issue. See the update guide for additional guidance.

Notes

@ajdecon ajdecon requested review from dholt and supertetelman April 26, 2022 18:25
@ajdecon
Copy link
Collaborator Author

ajdecon commented Apr 26, 2022

@dholt @supertetelman : Please review the release notes and let me know if this looks good! Then we can merge and tag a release.

@dholt
Copy link
Contributor

dholt commented Apr 26, 2022

Looks great to me. I like this better than the way we have been doing releases

@ajdecon ajdecon merged commit 405eb21 into NVIDIA:master Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants