Releases: GoogleCloudPlatform/cluster-toolkit
Releases · GoogleCloudPlatform/cluster-toolkit
v0.7.2-alpha: New features in `vm-instance`, updated documentation
Key New Features
- Spot provisioning and
threads_per_core
support in VM Instance module - Updated and improved documentation
Resource Improvements
vm-instance
: Spot provisioning supportvm-instance
: Option to setthreads_per_core
to enable or disable Simultaneous Multithreading (SMT)vpc
: Better support for supplying custom primary subnetworkvpc
: Better dependency trackingstartup-scripts
: Better dependency tracking
Improvements
- Updated Documentation, improvements to navigation in large README files
make install
andmake install-user
for installing the binary globally or locally.- Issue template added for reporting bugs in the HPC Toolkit
Bug Fixes
- Fixed: Terraform state doesn't update when overwriting a blueprint
What's Changed
- Support Spot provisioning in VM instance module by @tpdownes in #283
- Enable VPC module to accept subnetwork_name input variable by @tpdownes in #285
- Add threads-per-node option for vm-instance by @heyealex in #290
- Reduce 'suspend_time' in example to minimize destroy leaving behind compute nodes by @nick-stroud in #292
- Fix: terraform.tfstate.backup was written to terraform.tfstate during overwrite by @nick-stroud in #294
- Add
make install
option for root and user by @heyealex in #293 - Update quota documentation to match new defaults for filestore module by @nick-stroud in #297
- Add implicit dependencies in startup-scripts by @tpdownes in #298
- Add explicit dependencies in VPC module by @tpdownes in #295
- Add issue template by @heyealex in #300
- Added a TOC to examples/README, re-sorted examples by @cboneti in #301
- Update PD quota to match current example config by @nick-stroud in #302
- Update Intel Select tutorial to use new schema by @nick-stroud in #303
- Fix make tests by @tpdownes in #304
- Update name of previous resource groups folder to match new schema by @nick-stroud in #305
- Add provider_meta blocks by @nick-stroud in #309
- Update cmd README by @heyealex in #307
- Update to version 0.7.2-alpha by @heyealex in #310
- Release 0.7.2-alpha by @heyealex in #312
Full Changelog: v0.7.1-alpha...v0.7.2-alpha
v0.7.1-alpha: Documentation Additions, Updated Defaults, Bug Fixes, and Intel Select Example
v0.7.1-alpha: Documentation Additions, Updated Defaults, Bug Fixes, and Intel Select Example
Pre-release
Pre-release
Key New Features
- Improved documentation.
- Improved defaults on Filestore and Slurm.
- Additional modules allow specifying
project_id
independently from the globalproject_id
. - Spack install dir updated to avoid conflict with Slurm.
- Internal schema rename to match changes released in 0.7.0-alpha.
New Examples
What's Changed
- Documentation fixes by @tpdownes in #267
- Set default filestore size to lowest possible by @tpdownes in #268
- Rename internal schema data structures and variable names by @heyealex in #258
- Update modules to accept project_id as variable by @mittz in #272
- Update docs and defaults for spack install dir by @heyealex in #270
- Add troubleshooting tip for compute SA permissions by @heyealex in #271
- Update writer to run in group order by @heyealex in #274
- Update DDN EXAscaler naming and tags by @heyealex in #277
- Update slurm defaults to match recommendations by @heyealex in #275
- Lower filestore default tier to Basic HDD by @tpdownes in #276
- Add instructions for installing ansible in runners by @heyealex in #279
- Add Intel blueprints and Slurm job by @fertinaz-intel in #249
- Add links to community examples, document badges by @heyealex in #278
- Point to tutorials for quickstart in README by @heyealex in #273
- Intel blueprint updates by @tpdownes in #280
- Fix name of VM created by Intel Select Solution example by @tpdownes in #281
- Add support documentation to community modules by @heyealex in #282
- Refactor/pkg name update by @heyealex in #269
- Update to version 0.7.1-alpha by @nick-stroud in #287
- Partial Revert "Point to tutorials for quickstart in README" by @nick-stroud in #289
- Release 0.7.1-alpha by @nick-stroud in #286
New Contributors
- @fertinaz-intel made their first contribution in #249
Full Changelog: v0.7.0-alpha...v0.7.1-alpha
v0.7.0-alpha: Updated schema and component names, added community folder, new command line options
v0.7.0-alpha: Updated schema and component names, added community folder, new command line options
Pre-release
Pre-release
Key New Features
- Updated HPC Toolkit naming and schema with significant interface changes (read more below)
- Moved community contributions to community folder
- Overwrite flag (-w) optionally overwrites existing deployment folder while maintaining terraform state
- Terraform Backend can be configured from command line (--backend-configs)
- Recognition of the output of ghpc as a deployment, rather than blueprint:
ghpc create
now creates a folder withdeployment_name
instead ofblueprint_name
Naming changes
- Config YAML or Input YAML is now referred to as the HPC Blueprint
- Resource Groups are now Deployment Groups
- Blueprint Folder is now Deployment Folder
- Resources are now HPC Modules
- simple-instance is now vm-instance - Underlying module is the same
Blueprint YAML Schema Update
vars.deployment_name
is used byghpc
for creating the deployment folder name, rather thanblueprint_name
resource_groups
is nowdeployment_groups
resources
is nowmodules
, and modules are stored inmodules/
andcommunity/modules/
- Sourcing embedded modules starts with
modules
orcommunity/modules
Example:
deployment_group: # Was resource_groups:
modules: # Was Resources
- source: modules/... # Was `- source: resources/...`
Improvements
- Addition of "Community" folder
- Overwrite option (
-w
) for creating a deployment in the same directory, retaining the terraform state and keeping a backup of one prior deployment. - Improved instructions for deploying after create
- Support for startup-script with Packer resource
- Command Line Flag for specifying terraform state backend config (
--backend-config
) - More reliable project ID validation
What's Changed
- Cleanup prior to create update by @nick-stroud in #219
- Restore tfstate on create overwrite by @nick-stroud in #220
- Overwrite logic for create by @nick-stroud in #221
- Improve Packer template by @tpdownes in #224
- Improve Terraform instructions to user by @tpdownes in #225
- Add overwrite-blueprint argument to create command by @nick-stroud in #222
- Improve formatting of overwrite error by @nick-stroud in #227
- Improve functionality and documentation of Packer resource by @tpdownes in #228
- Create standard gitignore file in blueprint directory by @mittz in #223
- Create flag for specifying backend config by @mittz in #232
- Add section on how to see billing reports and fix typo by @mittz in #229
- Basic Cloud Shell Tutorial by @nick-stroud in #226
- Cloud Shell Tutorial - Merge to Develop by @nick-stroud in #233
- Update CLI usage instructions to use new naming convention by @nick-stroud in #236
- Improve instructions to use GitHub client in Google Cloud Shell by @tpdownes in #235
- Update all user facing references to resources by @heyealex in #237
- Update integration tests to use 'deployment_name' by @nick-stroud in #242
- Adding tutorial for Intel Select Solutions by @cboneti in #246
- Reimplement TestProjectExists with Compute Engine API by @mittz in #247
- Community Directory Reorg by @nick-stroud in #241
- Update flat list of modules by @nick-stroud in #239
- Update schema to deployment_groups and modules by @heyealex in #243
- Revert pre-commit PR validation to sequential exec by @heyealex in #251
- Update terminology for blueprint file and deployment directory by @nick-stroud in #245
- Merge tutorial from main to develop by @nick-stroud in #250
- Change name of
simple-instance
tovm-instance
by @heyealex in #252 - Standardize Slurm image variables by @nick-stroud in #253
- Update Packer documentation in main README by @tpdownes in #255
- Image building example for Slurm cluster by @nick-stroud in #254
- Update outdated reference to "simple instance" by @heyealex in #257
- Update create_blueprint.sh to create_deployment.sh by @heyealex in #256
- Error message for schema changes in v0.7.0a by @cboneti in #263
- Embed community modules by @heyealex in #264
- Add link to Lustre documentation in module readme by @mittz in #265
- Revert builder to not split pre-commit hooks by @heyealex in #261
- Update to version 0.7.0-alpha by @heyealex in #266
- Release 0.7.0-alpha by @heyealex in #259
Full Changelog: v0.6.0-alpha...v0.7.0-alpha
v0.6.0-alpha: Improved Packer support, fully featured Simple Instance, usability improvements
v0.6.0-alpha: Improved Packer support, fully featured Simple Instance, usability improvements
Pre-release
Pre-release
Key New Features
- Updated and more flexiable Packer resource support for VM Image Building
- Simple Instance now supports gVNIC, TIER 1 networking, and placement groups
- Ability to specify global variables on the command line (ex -
ghpc create example.yaml --vars project_id=my-project
)
New Features
- Startup script for installing Ops Agent for
- Validation of common global variables such as project_id, zone and region (documentation)
- Packer template now supports use of global variables and defaults to same network naming conventions as VPC module
Version updates
Improvements
- Additional content in documentation
- Efficiency improvements in integration and PR Validation tests
- Updated Spack/Gromacs example
- Added monitoring dashboard to hpc-cluster-high-io.yaml example
- More helpful output when creating a blueprint
- Updated and more flexiable Packer resource support for VM Image Building
- Integration testing support for Packer image building
- Improvements to error handling
- Simplify packer example
- Compatible with both Terraform Google Provider 3.x and 4.x
What's Changed
- Fix simple typo in README by @tpdownes in #169
- Add a spack cache build option to the spack resource by @douglasjacobsen in #154
- Upgrade to latest DDN EXAScaler by @nick-stroud in #173
- Build modularization by @cboneti in #170
- Toggle filestore api as part of nightly cleanup by @nick-stroud in #174
- Consolidate .gitignore files to root dir by @nick-stroud in #175
- Update URL for ansible-lint pre-commit hook by @tpdownes in #176
- Revert "Toggle filestore api as part of nightly cleanup" by @nick-stroud in #178
- Support running make tests with older releases of bash by @tpdownes in #179
- Add slurm cluster to spack-gromacs by @heyealex in #172
- Add startup-script to install Ops Agent by @mittz in #171
- Address issues destroying VPC with Terraform by @tpdownes in #182
- Update plugin to address build error with TFLint by @tpdownes in #180
- Remove unused functions and eliminate related warnings by @tpdownes in #184
- Add note about tflint version compatibility with tflint google plugins by @tpdownes in #186
- Add troubleshooting tip for def GCE SA permissions by @heyealex in #181
- Remove unused Packer variable and update license year in Terraform modules by @tpdownes in #189
- Add index rebuilding to populated caches by @douglasjacobsen in #185
- Add monitoring dashboard to example by @heyealex in #188
- Fix the way of GitHub module handling by @mittz in #187
- Minor fixes to readme paths and nfs security. by @cboneti in #193
- Support to set global variables at command line during ghpc create and expand by @mittz in #190
- Update TFLint plugin to 0.16.1 to match CI environment by @tpdownes in #196
- Create dedicated function for writing HCL attribute files by @tpdownes in #198
- Add documentation for resource fields by @heyealex in #195
- Implement a starting set of global variable validator functions by @tpdownes in #183
- Implementing parallel builds. by @cboneti in #197
- Add spack-gromacs example to integration tests by @heyealex in #191
- Migrate cty.Value conversion functions to config package by @tpdownes in #201
- Move user guide to readme by @nick-stroud in #199
- Consolidate quickstart instructions. by @nick-stroud in #200
- Change Terraform commands to use -chdir to avoid having to cd by @nick-stroud in #202
- Add monitoring integration test by @heyealex in #194
- Breaking daily integration builds in 3 groups. by @cboneti in #205
- Tier1 gvnic in simple instance by @nick-stroud in #203
- Enable Packer templates to resolve global variables by @tpdownes in #204
- Eliminate use of deprecated create flag from daily tests script by @tpdownes in #207
- Modify resolution of global variables to avoid errors on plain strings by @tpdownes in #208
- Add jq to builder image by @tpdownes in #209
- Update WriteBlueprint to handle error and test for overwrite condition by @nick-stroud in #213
- Add daily integration test of Packer example by @tpdownes in #211
- Change default VM type in Packer resource by @tpdownes in #212
- Add placement policy options to simple instance by @nick-stroud in #206
- Simplify Packer example and adopt Toolkit practices by @tpdownes in #214
- Fix packer integration test by @tpdownes in #218
- Increase version to 0.6.0-alpha by @nick-stroud in #217
- Release 0.6.0-alpha by @nick-stroud in #216
Full Changelog: v0.5.0-alpha...v0.6.0-alpha
v0.5.0-alpha: Omnia and Spack improvements, bug and stability fixes
Key New Features
- Updated resources
- Various bug fixes and updates
Improvements
- Improved Omnia examples and Omnia resources
- Improved Spack resource, including the ability to create a build caches
- Fixed various issues with nfs-server, filestore
- Updated required provider versions
- Improvements to Packer support and documentation
- Incorporated further terraform best practices
What's Changed
- Add rescue block to recover resume and suspend logs after failure by @nick-stroud in #128
- Add Spack example by @heyealex in #120
- Improve wording of TF Backend documentation by @heyealex in #135
- Fix the deployment name of the high io example by @heyealex in #137
- Update README to document credential usage by @tpdownes in #138
- Better ansible errors by @nick-stroud in #143
- Add subnetwork to simple-instance by @heyealex in #140
- Enable multiple integration tests to run simultaneously by @nick-stroud in #141
- Set TF_IN_AUTOMATION for Cloud Build integration tests by @tpdownes in #146
- Allow 4.x TPG (Terraform Provider Google) by @tpdownes in #147
- accelerator support in simple-instance by @cboneti in #149
- Allow 4.x TPG in all internal resources by @tpdownes in #150
- Remove provider blocks from child modules by @tpdownes in #151
- Add CODEOWNERS to support GitHub automation of reviewers by @tpdownes in #145
- Update filestore module to eliminate deprecation warnings with 4.x TPG by @tpdownes in #152
- Fix naming collision in nfs-server by @heyealex in #153
- Update omnia-install to use new toolkit features by @heyealex in #139
- Add spack config definitions in the spack resource by @douglasjacobsen in #148
- Revert "Add CODEOWNERS file for new GitHub Team" by @tpdownes in #155
- Align Packer provisioning with startup-script support for Ansible by @tpdownes in #156
- Increase timeout time for cloud builder. by @cboneti in #160
- Add go imports install to dependency checks by @heyealex in #159
- Fix an unmatched quote in example by @heyealex in #166
- Update Slurm controller README by @cboneti in #165
- Add a pause between slurm srun tests by @heyealex in #162
- Increase version to 0.5.0-alpha by @heyealex in #168
- Develop by @cboneti in #167
Full Changelog: v0.4.0-alpha...v0.5.0-alpha
v0.4.0-alpha: bug and stability fixes
Key New Features
- Source resources from github
outputs
field for promoting resource outputs to top level- CLI Autocompletion
Version updates
- SlurmGCP resources to 4.1.5
Improvements
- VPC resource improvements
- SourceReader package
- Documentation improvements
- Spack installation logging
- Improvements to examples
Bug Fixes
- Terraform backend application across resource groups (PR#110)
What's Changed
- Source resources from GitHub by @mittz in #100
- Markdownlint by @nick-stroud in #105
- Add more tflint rules by @heyealex in #108
- Add yaml/ansible/shell precommit hooks and update all code to be in compliance by @heyealex in #106
- Fix bug in applied group terraform backends by @heyealex in #110
- Update Installation, Basic Usage, and Development documentation by @nick-stroud in #109
- Fix typo by @tpdownes in #113
- Fix misspelling in variable description and add validation block by @tpdownes in #115
- Update the spack resource with logfiles and environments by @douglasjacobsen in #111
- Add option to promote module outputs to top level by @heyealex in #114
- Add tab command completion to CLI by @mittz in #118
- Bump slurm gcp version to v4.1.5 by @brandenm-nag in #119
- Add debug partitions to examples by @nick-stroud in #117
- Adds documentation on quotas required for examples by @nick-stroud in #121
- Replace path package with path/filepath by @mittz in #122
- Troubleshooting by @nick-stroud in #124
- Update pre-commit hooks to latest releases by @tpdownes in #126
- Filestore module should allow specification of project_id by @tpdownes in #125
- Add installation of goimports to Dockerfile by @heyealex in #129
- Remove vestigial module.json files and associated pre-commit hooks by @tpdownes in #127
- Missed a module.json file in cleanup. Remove last one by @tpdownes in #130
- Update small example to have same region and zone as high-io by @nick-stroud in #123
- Update version to 0.4.0 by @heyealex in #132
- Refactor VPC module to support finer settings and avoid failure when invalid region specified by @tpdownes in #131
- Develop by @heyealex in #133
New Contributors
- @nick-stroud made their first contribution in #105
Full Changelog: v0.3.1-alpha...v0.4.0-alpha
v0.3.1-alpha: bug and stability fixes
Key New Features
- Mostly bug and stability fixes
Bug Fixes
- DDN Exascaler no longer requires an SSH key by default
Improvements
- More parameter options for SLURM compute partition
- More parameter options for new project
What's Changed
- Changing default zone for itegration tests. by @cboneti in #95
- Minor fixes to support Rocky Linux 8 by @heyealex in #94
- Add cpu_platform as Slurm Partition option by @brandenm-nag in #96
- Partition outputs by @heyealex in #97
- Change DDN Exascaler private key to null by default by @heyealex in #98
- Project example by @heyealex in #99
- Minor refactoring of unit tests by @cboneti in #103
- Update patch version, version = 0.3.1-alpha by @heyealex in #102
- Merging develop into main by @cboneti in #104
Full Changelog: v0.3.0-alpha...v0.3.1-alpha
v0.3.0-alpha: simplified config yaml, spack support, service account resource
Pre-release
Key New Features
use
field to link resource outputs and inputs together automatically- Option to create a blueprint in a defined location (
ghpc create --out/-o
)
New Resources
Version updates
Improvements
- Improved Makefile: building ghpc no longer requires Terraform or Packer.
- ghpc labels added where missing
- More thorough integration testing
Deprecations
--config
/-c
flag when runningghpc expand
is no longer needed:ghpc expand path/to/config
.
What's Changed
- Add ability for a resource to "use" another resources output automatically by @heyealex in #72
- Update output and variable names to make use of "use" field by @heyealex in #75
- Fix makefile by @cboneti in #78
- Add a resource to create service accounts by @brandenm-nag in #80
- Update name of nat-ips to be unique by @douglasjacobsen in #81
- Spack installation resource by @douglasjacobsen in #76
- Update slurm scopes to allow GCS read access to startup-scripts by @heyealex in #84
- Add labels to resources/modules where missing by @heyealex in #83
- Add a -o flag that allows specifying the output by @mittz in #79
- Add integration tests for dependencies when building by @heyealex in #82
- Update startup script variable to allow in use field by @heyealex in #87
- Fix blueprintIO test so it works in builder by @heyealex in #90
- Update spack variables to be lists if not defined by @douglasjacobsen in #88
- Update resources readme by @heyealex in #85
- Fix Makefile so coverage fails when tests fail. by @cboneti in #89
- Update version to 0.3.0-alpha by @heyealex in #91
- Remove -c flag from expand command by @mittz in #92
- Fix Lustre regression preventing lustre from working with new vpcs. by @cboneti in #86
- Merging develop into main by @cboneti in #93
New Contributors
- @brandenm-nag made their first contribution in #80
- @douglasjacobsen made their first contribution in #81
Full Changelog: v0.2.2-alpha...v0.3.0-alpha
v0.2.2-alpha: new resources, improved startup scripts and version updates
Key New Features
New Resources
- HPC Dashboard Monitoring
- NFS Filesystem
- CloudSQL for Slurm Controller
Version updates
- Slurm-on-GCP
- DDNExascaler
Improvements
- Improved startup scrips with slum support
- Speedup
make
builds - Add checks for dependencies (Packer, Terraform)
Deprecations
- --config/-c no longer required when running
ghpc create ...
What's Changed
- Faster builds by @cboneti in #53
- Add dashboard monitoring resources by @heyealex in #54
- Merging faster builds from main to develop by @heyealex in #57
- Simplify/speedup simple "make" builds for user by @heyealex in #58
- Add packer-readme precommit by @heyealex in #59
- Adding unmanaged nfs and cloudsql integration to slurm cluster by @ziwang492 in #55
- Improved Startup scripts by @cboneti in #60
- Update slurm resources to 4.1.3 by @heyealex in #62
- Integration tests with cloud build by @heyealex in #61
- Update DDN to most recent commit (af5a5b3) by @heyealex in #64
- Fix minor typos in README by @mittz in #63
- Various fixes to integration tests by @heyealex in #65
- Add version requirements by @heyealex in #66
- Remove the -c flag for create by @mittz in #68
- Adding a script to crean left-over resource groups. by @cboneti in #67
- Merge Develop into Main by @cboneti in #69
- Increased version to v0.2.2-alpha (private preview). by @cboneti in #70
- merge develop-to-main (increased minor version to v0.2.2-alpha (private preview)) by @cboneti in #71
New Contributors
Full Changelog: v0.2.1-alpha...v0.2.2-alpha
v0.2.1-alpha: version and usability improvements + new resources.
Key New Features:
- Descriptive output when using ghpc
- SLURM and DDN third-party resources updated to newest versions
- Removed redundancy in SLURM resource definitions
- Embedded resources - Use standard resources without needing the local resources directory
- Additional validation of global variables
- New Resources
- new-project: Creates a new GCP project
- service-enablement: Allows management of multiple API services for a GCP project
What's Changed
- Changes in the test_examples to allow cloud build. by @cboneti in #27
- Update go-critic pre-commit by @heyealex in #30
- Simplify SLURM login and controller resources by @heyealex in #29
- Embed resources in ghpc, update source format by @heyealex in #28
- Adding cloud-build artifacts to accelerate tests. by @cboneti in #31
- Add validation for global vars by @heyealex in #32
- Builder img now uses latest of terraform-docs by @cboneti in #34
- Update SLURM resource defaults by @heyealex in #33
- Update lustre defaults to recommended values by @heyealex in #35
- Fix bug where global labels were not set by @heyealex in #36
- Update versions by @cboneti in #38
- add .DS_store to gitignore by @heyealex in #42
- Print descriptive output when creating blueprints or configs by @heyealex in #41
- Add license to resutils package by @heyealex in #44
- Set cloud builders to fail on any non-zero command by @heyealex in #46
- Add to Makefile make install-deps-dev by @heyealex in #45
- Rename passthrough variables to literal variables by @heyealex in #47
- Project branch by @ziwang492 in #43
- Bug fix: Omnia install script permissions by @heyealex in #49
- Update version to 0.2.1 by @heyealex in #50
- Improved documentation by @cboneti in #52
- Version 0.2.1 by @heyealex in #51
New Contributors
- @ziwang492 made their first contribution in #43
Full Changelog: v0.2.0-alpha...v0.2.1-alpha