Adding a server component for running multiple models on a single model worker. #1866

fozziethebeat · 2023-07-06T02:05:35Z

Why are these changes needed?

This adds a new server component that let's clients run multiple models on the same worker instance. With the new PeftModelAdapter and an eventual fix for huggingface/peft#430, this server component let's clients run multiple adapters that share the same base model weights and load the base model weights only once.

As of right now this not fully optimized since it loads the base model weights once per configured model, that is blocked on the Peft issue.

(replaces #1838)

Related issue number (if applicable)

Implements #1805 (maybe fixes?)

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

Adding a server component for running multiple workers

df5b4c4

fozziethebeat marked this pull request as ready for review July 6, 2023 02:05

fozziethebeat mentioned this pull request Jul 6, 2023

Adding a server component for running multiple workers #1838

Closed

3 tasks

Ying1123 approved these changes Jul 6, 2023

View reviewed changes

Ying1123 changed the title ~~Adding a server component for running multiple workers~~ Adding a server component for running multiple models on a single worker with multiple sub-works one for each. Jul 6, 2023

Ying1123 changed the title ~~Adding a server component for running multiple models on a single worker with multiple sub-works one for each.~~ Adding a server component for running multiple models on a single model worker. Jul 6, 2023

Ying1123 merged commit 5a003ab into lm-sys:main Jul 6, 2023

Ying1123 mentioned this pull request Jul 16, 2023

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a server component for running multiple models on a single model worker. #1866

Adding a server component for running multiple models on a single model worker. #1866

fozziethebeat commented Jul 6, 2023

Adding a server component for running multiple models on a single model worker. #1866

Adding a server component for running multiple models on a single model worker. #1866

Conversation

fozziethebeat commented Jul 6, 2023

Why are these changes needed?

Related issue number (if applicable)

Checks