Feat: add support for distributed serving type #1187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose of this PR
This PR introduces a new serving type called
distributed
to Arena's serving module. The primary motivation behind these changes is to enable the deployment of large-scale models across multiple nodes within a Kubernetes (K8s) cluster.Proposed changes:
distributed
to Arena's serving module which can deploy model across multiple nodes.distributed
serving type.Which issue(s) this PR fixes:
Fixes #1186
Change Category
Rationale
The
distributed
serving type addressed the increasing demand for multi-host inference due to the advancement of large language models (LLMs) such as Meta's Llama-3.1-405B. Currently, Arena lacks the capability to deploy models distributed across multiple nodes, and this PR aims to fill the gap.