Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for deploying large-scale model across multiple nodes #1186

Open
linnlh opened this issue Nov 1, 2024 · 0 comments · May be fixed by #1187
Open

Add support for deploying large-scale model across multiple nodes #1186

linnlh opened this issue Nov 1, 2024 · 0 comments · May be fixed by #1187

Comments

@linnlh
Copy link

linnlh commented Nov 1, 2024

What you would like to be added?

Hi, I'm currently working on running inference on large-scale models in k8s cluster. And in some cases this means running it across multiple nodes. I have noticed that it is hard to deploy models distributed by using Arena. Therefore, I suggest to introduce a new serving type called distributed to Arena's serving module.

The basic use will like:

arena serve distributed \
  --name=distributed-sample
  --image=xxx
  --restful-port=5000 \
  --masters=1 \
  --master-cpu=1 \
  --master-memory=2Gi \
  --master-gpus=0 \
  --master-command="python serving.py"
  --workers=2 \
  --worker-cpu=4 \
  --worker-memory=8Gi \
  --worker-gpus=4 \
  --worker-command="sleep 30d"

The distributed serving aims to manage replicas of "group". Each "group" consists of multiple pods. There are two type of pods in "group": master and worker, and user can specify the resources required for these two types of pod independently.

Why is this needed?

The recent model are becoming increasingly sophisticated and larger in size. Especially after Meta released models like Llama-3.1-405B, it is hard to deploy such massive model on a single node. To address this, user tends to deploy this type of model distributed across multiple nodes.

Currently, Arena does not support for distributed model deployment. This limitation affects users who wish to deploy large-scale model like Llama-3.1-405B by using Arena. Therefore, I think there is a need to support a distributed serving type in order to meet user needs.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@linnlh linnlh linked a pull request Nov 1, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant