You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now fastchat/serve/model_worker.py supports one model. With LoRA (or other PEFT options) trained adapters, we could in theory load one base model and multiple adapters per worker and reduce the amount of times we need to load the base model. fastchat/serve/model_worker.py is probably a bad place to do this as it would complicate things, but it looks like forking the script to something like fastchat/serve/multi_model_worker.py should be pretty easy.
New options would need to support a list of model_path:model_name pairs. A safe hard assumption would be that all listed models share the same base model and that it would only be loaded once. The worker can then register each LoRA model with the controller.
Would a setup like this break anything within the controller? I'm assuming no however I haven't checked this directly.
If this seems reasonable, I'm happy to draft a PR.
The text was updated successfully, but these errors were encountered:
Cool, then I can put it together, it's a high priority item for me. I think right now it won't be memory efficient since the base model can't be used by two different PeftModels, but I'll setup the basic server runner and find out what breaks.
Right now
fastchat/serve/model_worker.py
supports one model. With LoRA (or other PEFT options) trained adapters, we could in theory load one base model and multiple adapters per worker and reduce the amount of times we need to load the base model.fastchat/serve/model_worker.py
is probably a bad place to do this as it would complicate things, but it looks like forking the script to something likefastchat/serve/multi_model_worker.py
should be pretty easy.New options would need to support a list of
model_path:model_name
pairs. A safe hard assumption would be that all listed models share the same base model and that it would only be loaded once. The worker can then register each LoRA model with the controller.Would a setup like this break anything within the controller? I'm assuming no however I haven't checked this directly.
If this seems reasonable, I'm happy to draft a PR.
The text was updated successfully, but these errors were encountered: