-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: metrics endpoint #5850
Comments
I have one idea to improve If user only want to call The application of this such case is where |
it blocks only during prompt/image processing, llama_decode. It works as expected while slots are waiting for the next token. So, as @ggerganov explained earlier in #5851, we are fine with it. I will just fix the bucket window reset on /health instead of /metrics, add a few more total metrics, and fix json cast to int instead of float on kv cache use ratio. |
…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. Closes #5850
…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (ggerganov#5937) Closes ggerganov#5850
…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (ggerganov#5937) Closes ggerganov#5850
…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (ggerganov#5937) Closes ggerganov#5850
…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (ggerganov#5937) Closes ggerganov#5850
Issues
/metrics
endpoint share the same task event as/health
:TASK_TYPE_METRICS
. It means metrics are reset on both calls.Process-Start-Time-Unix
http response header is not set.llamacpp:prompt_tokens_seconds
andllamacpp:predicted_tokens_seconds
are per slots, while the server actually process llamacpp:prompt_tokens_seconds * n_slotsProposal
llamacpp:prompt_tokens_seconds_total
andllamacpp:predicted_tokens_seconds_total
The text was updated successfully, but these errors were encountered: