[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

fozziethebeat · 2023-06-29T00:31:41Z

Right now fastchat/serve/model_worker.py supports one model. With LoRA (or other PEFT options) trained adapters, we could in theory load one base model and multiple adapters per worker and reduce the amount of times we need to load the base model. fastchat/serve/model_worker.py is probably a bad place to do this as it would complicate things, but it looks like forking the script to something like fastchat/serve/multi_model_worker.py should be pretty easy.

New options would need to support a list of model_path:model_name pairs. A safe hard assumption would be that all listed models share the same base model and that it would only be loaded once. The worker can then register each LoRA model with the controller.

Would a setup like this break anything within the controller? I'm assuming no however I haven't checked this directly.

If this seems reasonable, I'm happy to draft a PR.

The text was updated successfully, but these errors were encountered:

fozziethebeat · 2023-06-30T05:57:53Z

Note: A major Peft bug seems to block this from working right now: huggingface/peft#430

merrymercy · 2023-07-01T13:52:05Z

Yes, this is definitely a feature we want.
cc @ZYHowell

fozziethebeat · 2023-07-02T04:53:58Z

Cool, then I can put it together, it's a high priority item for me. I think right now it won't be memory efficient since the base model can't be used by two different PeftModels, but I'll setup the basic server runner and find out what breaks.

Ying1123 · 2023-07-16T07:31:06Z

Close the issue because it is done by #1866 and #1905.

merrymercy added the enhancement New feature or request label Jul 1, 2023

This was referenced Jul 3, 2023

Adding a server component for running multiple workers #1838

Closed

Adding a server component for running multiple models on a single model worker. #1866

Merged

Allow Peft models to share their base model #1905

Merged

Ying1123 closed this as completed Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

fozziethebeat commented Jun 29, 2023

fozziethebeat commented Jun 30, 2023

merrymercy commented Jul 1, 2023

fozziethebeat commented Jul 2, 2023

Ying1123 commented Jul 16, 2023

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

[Feature Request] Fork fastchat/serve/model_worker.py to support multiple LoRA models #1805

Comments

fozziethebeat commented Jun 29, 2023

fozziethebeat commented Jun 30, 2023

merrymercy commented Jul 1, 2023

fozziethebeat commented Jul 2, 2023

Ying1123 commented Jul 16, 2023