return_log_probs slow down generation #2367

Desmond819 · 2024-10-24T00:50:47Z

System Info

Hi team,

when I try to add return_log_probs true and output_log_probs = true to return logprob, the generation speed degraded significantly. Below is my testing.

time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0}}' | grep 'data: ' | wc -l

Result: 0m6.186s

time curl -X POST localhost:8080/v2/models/ensemble/generate_stream -d '{"text_input": "Explain in detail 5 important events of WW2.", "parameters": {"max_tokens": 2000,"bad_words":[""],"stop_words":[""],"stream": true,"temperature": 0.0, "return_log_probs": true}}' | grep 'data: ' | wc -l

Result: 1m9.191s

I am running Meta-Llama-3.1-8B-Instruct on a 8xH100 with TP8.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

See above

Expected behavior

NA

actual behavior

NA

additional notes

NA

The text was updated successfully, but these errors were encountered:

Superjomn · 2024-10-24T09:14:10Z

Thanks for sharing the issue.
I wonder what API you are using. The triton-backend, the openai_server.py or anything else? @Desmond819

Desmond819 · 2024-10-24T09:52:54Z

I am using triton-backend

Desmond819 added the bug Something isn't working label Oct 24, 2024

Superjomn added performance issue Issue about performance number Investigating labels Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return_log_probs slow down generation #2367

return_log_probs slow down generation #2367

Desmond819 commented Oct 24, 2024 •

edited

Loading

Superjomn commented Oct 24, 2024

Desmond819 commented Oct 24, 2024

return_log_probs slow down generation #2367

return_log_probs slow down generation #2367

Comments

Desmond819 commented Oct 24, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Superjomn commented Oct 24, 2024

Desmond819 commented Oct 24, 2024

Desmond819 commented Oct 24, 2024 •

edited

Loading