Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.41.0 Adoption of Slurm 24.05 and Improvements to GKE Support

25 Oct 16:58
26fafe0
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

  • Create and use non-default service accounts in GKE by @annuay-google in #3123
  • Added documentation on cloud-ops-agent installation and stackdriver removal by @jrossthomson in #3029
  • Ensure local SSD filesystem is assembled into a RAID even upon power off/on cycles by @tpdownes in #3129

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Fixed the exact number constraint problem for additional vpcs in gpu_direct checks by @sharabiani in #3078
  • Provide explicit project information by @wiktorn in #3060
  • Chrome Remote Desktop: increase resilience of apt operations by @tpdownes in #3093
  • Add mount parallelstore service to mount parallelstore for every reboot by @harshthakkar01 in #3125

New Contributors

Full Changelog: v1.40.1...v1.41.0

v1.40.1 Fix issue that affected GKE blueprints due to dynamic provisioning

10 Oct 01:20
eb00254
Compare
Choose a tag to compare

What's Changed

Other changes

  • Revert PR#3046 and add more line breaks for readability by @ankitkinra in #3115

Full Changelog: v1.40.0...v1.40.1

v1.40.0: A3 Mega and A3 High families supported in GKE

03 Oct 21:13
f9f9256
Compare
Choose a tag to compare

What's Changed

Important

All HPC VM images based upon CentOS 7 have been deprecated. This means that
referring to the "hpc-centos-7" family in the "cloud-hpc-image-public"
project will fail. We recommend migrating to the "hpc-rocky-linux-8" family
that is the new default throughout the Toolkit. If CentOS 7 is truly needed,
the final HPC CentOS 7 image can be used by its name: "hpc-centos-7-v20240712".

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

Other changes

  • NeMo readme instructions for preloading gpt2 tokenizer by @koallison in #3075

New Contributors

Full Changelog: v1.39.0...v1.40.0

v1.39.0: Slurm reservations during maintenance windows, Improved GKE Support, removed CentOS 7 references

12 Sep 19:38
7699f5d
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Bug fixes 🐞

  • Add slurmgcp-managed infix to resource policy name by @mr0re1 in #2892
  • Move pytest and other package installation to make by @annuay-google in #2890
  • Prevent use of google provider 6.0 where breaking changes are in use by @tpdownes in #2978
  • Fix local_ssd_config issue that forces node-pool recreation by @sharabiani in #2968
  • kubernetes provider added to gke-cluster module by @sharabiani in #2985
  • Fix for cleanup script. The last input is optional by @cdunbar13 in #2993
  • Catch "None" fields in slurm job datetime data for BigQuery by @fdmalone in #2992

Other changes

New Contributors

Full Changelog: v1.38.0...v1.39.0

v1.38.0: Slurm GCP v6 for a3-highgpu-8g and added ability to disable automatic updates

15 Aug 23:20
1e38ce0
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.37.2...v1.38.0

v1.37.2 Fix SlurmGCP cleanup of resource policies

09 Aug 21:23
229803f
Compare
Choose a tag to compare

What's Changed

Bug fixes 🐞

  • Delete at most one resource policy at a time by @mr0re1 in #2895

Full Changelog: v1.37.1...v1.37.2

v1.37.1: Documentation update

02 Aug 18:13
9e68ecc
Compare
Choose a tag to compare

Fix minor typographical errors in documentation

Full Changelog: v1.37.0...v1.37.1

v1.37.0

31 Jul 21:14
54da9b7
Compare
Choose a tag to compare

The HPC Toolkit has been rebranded to Cluster Toolkit. More details will follow shortly. The github repository has been renamed to match. This should not break existing workflows. References to the old name should seamlessly redirect to the updated repo. The binary has been renamed to gcluster (formally ghpc) but ghpc has been symlinked and will continue to work. If any unexpected behavior is noticed as part of this transition, please report it.

What's Changed

Key New Features 🎉

Other changes

Full Changelog: v1.36.1...v1.37.0

v1.36.1: Fix Slurm GCP Cloud Parameter Defaults

26 Jul 22:45
493308e
Compare
Choose a tag to compare

What's Changed

Bug fixes 🐞

Full Changelog: v1.36.0...v1.36.1

v1.36.0 - Parallelstore support

19 Jul 16:59
da56862
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Add support for parallelstore in pre-existing-network-storage by @harshthakkar01 in #2701
  • Develop and adopt boot-time fix for EOL CentOS 7 repositories by @tpdownes in #2738

New Modules 🧱

Module Improvements 🔨

  • Add 'source' argument for path to prolog or epilog scripts by @andybubu in #2670
  • Allow users to turn on access to cluster via GCP public IP address space by @ankitkinra in #2687
  • Add known gpu types and their accelerators to gke module by @ankitkinra in #2680
  • Add disk_type for HTCondor's EP template by @aneo-ssam in #2705

Improvements 🛠

  • Update A3 mega blueprint to use Slurm-GCP 6.5.12 by @tpdownes in #2763

Bug fixes 🐞

  • Revert "Remove installation of enroot and pyxis from a3-highgpu-8g blueprint" by @samskillman in #2722
  • Only enable gpu taints if guest_acclerator list is not empty by @ankitkinra in #2727
  • Move GCESysPrep to provisioner in Windows scripts by @tpdownes in #2728
  • Modify a3-highgpu-8g image-building blueprint network by @tpdownes in #2744
  • Update image to new centos image for both login and builder nodes by @ankitkinra in #2780

Other changes

  • Add validator for Terraform version and SlurmGCP6 by @mr0re1 in #2772

New Contributors

Full Changelog: v1.35.1...v.1.36.0