-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More metrics on LPG script #832
Conversation
benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py
Outdated
Show resolved
Hide resolved
benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py
Outdated
Show resolved
Hide resolved
benchmarks/benchmark/tools/profile-generator/container/benchmark_serving.py
Outdated
Show resolved
Hide resolved
# If a value is specified for a given key, it will be populated on the outputs `summary_stats.stats` field as 'value':'stats' as well. | ||
if backend == "vllm": | ||
return { | ||
"vllm:gpu_cache_usage_perc": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does vllm get ttft, tpot, request_latency if jetstream does it like below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't formally report stats on these metrics, although we do collect and report these. Ideally we can collect these stats from prometheus, this will be addressed in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, where do we have the data the benchmark produced in the report? We report request latency, average time per output token, etc.
Example LPG json results, a few notes:
inf
to buckets