Replies: 4 comments 8 replies
-
Yes — tagging @XenonMolecule @arnavsinghvi11 (mainly Arnav, but Michael has set up SGLang which might be a lot faster) |
Beta Was this translation helpful? Give feedback.
-
Yes DSPy is compatible with VLLM! You can find the VLLM model in Line 119 in afdf353 I have used this same class with SGLang before and it has been quite efficient. Example usage: I'm unsure of any tutorials/colab notebooks. It is mostly a drop in replacement for HF TGI! It could still be nice to have a tutorial or some docs about this, I'll defer to @arnavsinghvi11 to point out if any such documentation exists! |
Beta Was this translation helpful? Give feedback.
-
Nice!
My main question is...I'm only familiar with loading the model in vllm
itself -- not using urls. e.g., from the quick start tutorial
https://docs.vllm.ai/en/latest/getting_started/quickstart.html :
```python
# https://github.com/brando90/snap-cluster-setup/blob/main/src/test_vllm.py
# copy pasted from
https://docs.vllm.ai/en/latest/getting_started/quickstart.html
# do export VLLM_USE_MODELSCOPE=True
from vllm import LLM, SamplingParams
def test_vllm():
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="facebook/opt-125m")
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
if __name__ == "__main__":
import time
start_time = time.time()
test_vllm()
print(f"Time taken: {time.time() - start_time:.2f} seconds, or
{(time.time() - start_time) / 60:.2f} minutes, or {(time.time() -
start_time) / 3600:.2f} hours.\a")
```
How do I do it with that?
Or alternatively, can you show me how your running your vllm server for
your code to work.
I'm also puzzled, some of DSPy claims to fine-tune/change the weights. How
would that work if my model is local and using vllm? Perhaps that vllm +
change weights feature is not supported?
…On Sun, Apr 14, 2024 at 5:31 PM Michael Ryan ***@***.***> wrote:
Yes DSPy is compatible with VLLM! You can find the VLLM model in
/dspy/dsp/modules/hf_client.py:
https://github.com/stanfordnlp/dspy/blob/afdf3539794b3f4b1f3d85dc74fec8254e4b0e1c/dsp/modules/hf_client.py#L119
I have used this same class with SGLang before and it has been quite
efficient.
Example usage:
llama =
dspy.HFClientVLLM(model="meta-llama/Llama-2-13b-chat-hf",port=None,url=["
http://URL:7000","http://URL:7001"],max_tokens=150)
I'm unsure of any tutorials/colab notebooks. It is mostly a drop in
replacement for HF TGI! It could still be nice to have a tutorial or some
docs about this, I'll defer to @arnavsinghvi11
<https://github.com/arnavsinghvi11> to point out if any such
documentation exists!
—
Reply to this email directly, view it on GitHub
<#818 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOE6LUQKGDRKFCO6AW325DY5MNX3AVCNFSM6AAAAABGGLZHTKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TCMJSGA4DM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Finetuning happens only with BootstrapFinetune. The others are prompt optimizers. This isn’t through VLLM but uses HF trainers as @sutyum says. We have two nice research projects building more finetuning-based optimizers, but they won’t be out until NeurIPS. |
Beta Was this translation helpful? Give feedback.
-
I wanted to use vLLM and DSPy and since vLLM is the fastest inference framework for open source llms, I thought to ask. Is it possible to use together? is there a tutorial/colab for it I may build upon and contribute?
Related:
Beta Was this translation helpful? Give feedback.
All reactions