Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray Webhook Support for Single-Host, Multi-Slice TPUs #453

Merged
merged 2 commits into from
Mar 26, 2024

Conversation

ryanaoleary
Copy link
Collaborator

This PR adds support for single-host, multi-slice TPU worker groups, which require the TPU_NAME and TPU_WORKER_ID env vars. Specifically, this PR changes the webhook to inject a replicaIndex label for all Ray worker pods requesting TPUs (previously injected a multiHostReplica label only for multi-host). Pods which are deleted and restart will be assigned the same TPU_WORKER_ID, TPU_NAME, and replicaIndex. This PR was manually tested by creating single-host, single-slice, multi-host, and multi-slice RayClusters and ensuring that all environment variables were injected correctly, even after multiple Pod deletions.

@ryanaoleary ryanaoleary requested a review from richardsliu March 26, 2024 21:30
@ryanaoleary ryanaoleary self-assigned this Mar 26, 2024
@ryanaoleary
Copy link
Collaborator Author

/gcbrun

@ryanaoleary ryanaoleary merged commit bc25097 into main Mar 26, 2024
8 checks passed
@ryanaoleary ryanaoleary deleted the single-host-multi-slice branch March 27, 2024 00:03
umeshkumhar added a commit that referenced this pull request Mar 27, 2024
* add rag kuberay and jupyterhub image (#440)

* Rollback to previous image (#454)

* Ray Webhook Support for Single-Host, Multi-Slice TPUs (#453)

* Fix incorrect replicaIndex for single-host, multi replica

* Fix single-host, multi-slice deletion logic

* Update README & simplify workloads.tfvars for RAG (#445)

* RAG marketplace updates (#456)

* fix RAG marketplace changes

---------

Co-authored-by: Chia-Yi Liang <chiayiliang327@gmail.com>
Co-authored-by: zlq <zlq@google.com>
Co-authored-by: ryanaoleary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: imreddy13 <132504814+imreddy13@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants