Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: add support for distributed serving type #1187

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

linnlh
Copy link

@linnlh linnlh commented Nov 1, 2024

Purpose of this PR

This PR introduces a new serving type called distributed to Arena's serving module. The primary motivation behind these changes is to enable the deployment of large-scale models across multiple nodes within a Kubernetes (K8s) cluster.

Proposed changes:

  • Introduce a new serving type called distributed to Arena's serving module which can deploy model across multiple nodes.
  • Update the relevant doc to provide guidance for using distributed serving type.

Which issue(s) this PR fixes:
Fixes #1186

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

The distributed serving type addressed the increasing demand for multi-host inference due to the advancement of large language models (LLMs) such as Meta's Llama-3.1-405B. Currently, Arena lacks the capability to deploy models distributed across multiple nodes, and this PR aims to fill the gap.

林联辉 added 2 commits October 31, 2024 19:12
Signed-off-by: 林联辉 <linlianhui.llh@alibaba-inc.com>
Signed-off-by: 林联辉 <linlianhui.llh@alibaba-inc.com>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign syulin7 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linnlh linnlh changed the title feat: add support for distributed serving type Feat: add support for distributed serving type Nov 1, 2024
Signed-off-by: 林联辉 <linlianhui.llh@alibaba-inc.com>
@linnlh
Copy link
Author

linnlh commented Nov 1, 2024

@ChenYi015 @cheyang @Syulin7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for deploying large-scale model across multiple nodes
1 participant