You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I try to add return_log_probs true and output_log_probs = true to return logprob, the generation speed degraded significantly. Below is my testing.
time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0}}' | grep 'data: ' | wc -l
Result: 0m6.186s
time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0, "return_log_probs": true}}' | grep 'data: ' | wc -l
Result: 1m9.191s
I am running Meta-Llama-3.1-8B-Instruct on a 8xH100 with TP8.
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
See above
Expected behavior
NA
actual behavior
NA
additional notes
NA
The text was updated successfully, but these errors were encountered:
System Info
Hi team,
when I try to add return_log_probs true and output_log_probs = true to return logprob, the generation speed degraded significantly. Below is my testing.
time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0}}' | grep 'data: ' | wc -l
Result: 0m6.186s
time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0, "return_log_probs": true}}' | grep 'data: ' | wc -l
Result: 1m9.191s
I am running Meta-Llama-3.1-8B-Instruct on a 8xH100 with TP8.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
See above
Expected behavior
NA
actual behavior
NA
additional notes
NA
The text was updated successfully, but these errors were encountered: