Strange drops in total requests #50

sterres · 2021-07-07T14:29:06Z

Hi,

I'm getting strange drops in the http_requests_total metric for the "/metrics" endpoint. I was expecting a monotonic increase as with each scrape, the "/metrics" counter should increase by one.

But it looks like that:

Any idea what I'm doing wrong?

Thanks and BR
Simon

sterres · 2021-07-08T09:14:42Z

It seems to be related to multiprocess workers of gunicorn server (I used the Docker image: https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker).

It works fine, when setting environment variable MAX_WORKERS="1" for the FastAPI container.

Some instructions on how to solve it can be found here: https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn

But I don't know how the solution can be implemented in this tool. If someone managed to get it working I would be happy for help :)

saschnet · 2021-08-17T12:36:43Z

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.

To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

Add provision of multiprocess registry in gunicorn.conf (add the following to the default):

from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc
Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:

#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc

This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.

neilferreira · 2021-12-08T06:30:45Z

@saschnet FWIW it looks like this project supports multiprocess collection by simply setting the "prometheus_multiproc_dir" environment variable.

https://github.com/trallnag/prometheus-fastapi-instrumentator/blame/master/prometheus_fastapi_instrumentator/instrumentation.py#L257-L267

nazzour · 2021-12-27T20:49:55Z

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.

To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

Add provision of multiprocess registry in gunicorn.conf (add the following to the default):
from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)
If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc

Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:
#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc
This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.

Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please ? Thanks

neilferreira · 2021-12-29T13:34:09Z

Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please

If you visit your /metrics page, does it look like this?

# HELP foo_http_requests_total Multiprocess metric
# TYPE foo_http_requests_total counter

Importantly, indicating that it is using the Multiprocess metric?

If not, can you confirm if you're setting the prometheus_multiproc_dir environment variable and that the directory exists on your server/computer? If you have the means to do so, you can drop some debug statements into this chunk of code to determine what is going on https://github.com/trallnag/prometheus-fastapi-instrumentator/blame/master/prometheus_fastapi_instrumentator/instrumentation.py#L257

IWillPull · 2022-05-17T13:37:06Z

modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

@saschnet could you elaborate more on this?

saschnet · 2022-06-27T18:03:13Z

modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

@saschnet could you elaborate more on this?

I only published the endpoint to a different port as explained so far. But I think simply exposing the endpoint as described in the documentation should be sufficient: https://github.com/trallnag/prometheus-fastapi-instrumentator#exposing-endpoint

Have you tried that yet?

Pazzeo · 2023-01-16T14:45:57Z

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.
To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

Add provision of multiprocess registry in gunicorn.conf (add the following to the default):
from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)
If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc

Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:
#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc
This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.
Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please ? Thanks

Hello, Sorry to come back to this issue but I'm trying to follow your indication to setup a different port of metrics but it seems do not work. Could you please help me? I'm using the same Docker image.

Thanks
Paz

trallnag · 2023-02-22T22:01:07Z

Fixed in #42 / #217

We recently upgraded to Gunicorn for inference (multi-proc), which broke our prometheus stats and we got strange drops (see trallnag/prometheus-fastapi-instrumentator#50 ). For multi process model setting `PROMETHEUS_MULTIPROC_DIR` is required, see [here](https://github.com/prometheus/client_python/blob/master/README.md#multiprocess-mode-eg-gunicorn).

michaelusner mentioned this issue May 25, 2022

Deprecate lower-case 'prometheus_multiproc_dir' #42

Closed

haoming37 mentioned this issue Sep 14, 2022

http_request_duration_seconds_count decreasing #184

Open

Pazzeo mentioned this issue Jan 3, 2023

CPU and MEM metrics not available with multiworkers #207

Closed

This was linked to pull requests Feb 22, 2023

Deprecate lower-case 'prometheus_multiproc_dir' #42

Closed

chore: Deprecate lower-case prometheus_multiproc_dir #217

Merged

trallnag closed this as completed Feb 22, 2023

liyunrui mentioned this issue Mar 7, 2023

Cannot See Multiprocess metric under /metric page while suing multiple workers #227

Closed

andreaskoepf mentioned this issue Apr 28, 2023

set PROMETHEUS_MULTIPROC_DIR env var LAION-AI/Open-Assistant#2956

Merged

Antsthebul mentioned this issue Feb 20, 2024

Setting the PROMETHEUS_MULTIPROC_DIR is not mentioned in the docs #286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange drops in total requests #50

Strange drops in total requests #50

sterres commented Jul 7, 2021 •

edited

Loading

sterres commented Jul 8, 2021

saschnet commented Aug 17, 2021

neilferreira commented Dec 8, 2021

nazzour commented Dec 27, 2021

neilferreira commented Dec 29, 2021

IWillPull commented May 17, 2022

saschnet commented Jun 27, 2022

Pazzeo commented Jan 16, 2023

trallnag commented Feb 22, 2023

Strange drops in total requests #50

Strange drops in total requests #50

Comments

sterres commented Jul 7, 2021 • edited Loading

sterres commented Jul 8, 2021

saschnet commented Aug 17, 2021

neilferreira commented Dec 8, 2021

nazzour commented Dec 27, 2021

neilferreira commented Dec 29, 2021

IWillPull commented May 17, 2022

saschnet commented Jun 27, 2022

Pazzeo commented Jan 16, 2023

trallnag commented Feb 22, 2023

sterres commented Jul 7, 2021 •

edited

Loading