server: metrics endpoint #5850

phymbert · 2024-03-03T08:28:16Z

Issues

Server /metrics endpoint share the same task event as /health: TASK_TYPE_METRICS. It means metrics are reset on both calls.
the Process-Start-Time-Unix http response header is not set.
metrics llamacpp:prompt_tokens_seconds and llamacpp:predicted_tokens_seconds are per slots, while the server actually process llamacpp:prompt_tokens_seconds * n_slots

Proposal

Add a data params in TASK_TYPE_METRICS to reset the metric bucket only in /metrics
Add llamacpp:prompt_tokens_seconds_total and llamacpp:predicted_tokens_seconds_total

The text was updated successfully, but these errors were encountered:

ngxson · 2024-03-03T11:15:03Z

I have one idea to improve /health but I'm not sure if it's currently possible or not:

If user only want to call /health to check if the server is busy doing something or not, then this call should not be blocking. Why - because if the call is blocked, then the moment it finishes, it always says that the server is free - meaning health will never be "busy"

The application of this such case is where include_slots is not set, that mean user just want to see how many idle / processing slot. This check can be done easily with a mutex I think. Only when include_slots is set, then we push the request to queue.

phymbert · 2024-03-03T11:50:17Z

it blocks only during prompt/image processing, llama_decode. It works as expected while slots are waiting for the next token. So, as @ggerganov explained earlier in #5851, we are fine with it. I will just fix the bucket window reset on /health instead of /metrics, add a few more total metrics, and fix json cast to int instead of float on kv cache use ratio.

…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. Closes #5850

…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Closes #5850

…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (ggerganov#5937) Closes ggerganov#5850

phymbert added bug Something isn't working server/webui labels Mar 3, 2024

phymbert self-assigned this Mar 3, 2024

phymbert added a commit that referenced this issue Mar 8, 2024

server: metrics: add llamacpp:prompt_seconds_total and llamacpp:token…

ab385fd

…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. Closes #5850

phymbert mentioned this issue Mar 8, 2024

server: metrics: fix bucket reset on /health, add few more metrics #5937

Merged

phymbert closed this as completed in #5937 Mar 8, 2024

phymbert added a commit that referenced this issue Mar 8, 2024

server: metrics: add llamacpp:prompt_seconds_total and llamacpp:token…

76e8688

…s_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Closes #5850

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: metrics endpoint #5850

server: metrics endpoint #5850

phymbert commented Mar 3, 2024

ngxson commented Mar 3, 2024

phymbert commented Mar 3, 2024

server: metrics endpoint #5850

server: metrics endpoint #5850

Comments

phymbert commented Mar 3, 2024

Issues

Proposal

ngxson commented Mar 3, 2024

phymbert commented Mar 3, 2024