Skip to content

Latest commit

 

History

History
84 lines (65 loc) · 3.13 KB

CHANGELOG.md

File metadata and controls

84 lines (65 loc) · 3.13 KB

Changelog

[Unreleased]

  • Move away from static GKE version and use RAPID release default.

0.4.0 - 2024-04-17

  • Enable Autoprovisioning support.
  • Support deletion of subnets when cluster is deleted.
  • Integrate Vertex AI functionality to create Vertex AI Tensorboard and upload logs in Tensorboard directory to Vertex AI Tensorboard.
  • Integrate Pathways.
  • Add retry logic to Kueue, jobset and cluster credential steps.
  • Bump Kueue version to 0.6.1.
  • Add XPK inspector.

0.3.0 - 2024-02-26

  • Bump Jobset version to 0.3.2.
  • Bump Kueue version to 0.6.0.
  • Add single GPU support. Multislice, and A3 GPU optimizations in progress.
  • Add CPU single and multislice support to XPK. Tested up to 1500 VMs.
  • Fail workload creation early if the cluster doesn't have that resource type.
  • Enable multiple workload deletion in parallel based on cluster name, and filters.
  • Add --enable-debug-logs to workload create to add debug logs to a workload.
  • Support SIGTERM handling in xpk workload command, and propagate exit code from user jobs to cloud composer UI.
  • Add sidecar container to display stack traces, and README details.
  • Add label for xpk initiated TPU pods.

0.2.0 - 2023-12-07

Added

  • Add a reservation exists check and provide help if this errors
  • Add error message and self-help instructions to readme for troubleshooting problems
  • Add v5p support
  • Add xpk cluster create flags for reservation/on-demand/spot
  • Change GKE version to 1.28.3-gke.1286000
  • Change cpu node pool defaults to be better adapted to demand
  • Fix empty results from filter-by-status=QUEUED / FAILED / RUNNING
  • Fix parallel execution of node pool commands (concurrent ops)
  • Fix pip-changelog to the wrong package

0.1.0 - 2023-11-17

Added

  • Initial release of xpk PyPI package