Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange drops in total requests #50

Closed
sterres opened this issue Jul 7, 2021 · 9 comments · Fixed by #217
Closed

Strange drops in total requests #50

sterres opened this issue Jul 7, 2021 · 9 comments · Fixed by #217

Comments

@sterres
Copy link

sterres commented Jul 7, 2021

Hi,

I'm getting strange drops in the http_requests_total metric for the "/metrics" endpoint. I was expecting a monotonic increase as with each scrape, the "/metrics" counter should increase by one.

But it looks like that:
image

Any idea what I'm doing wrong?

Thanks and BR
Simon

@sterres
Copy link
Author

sterres commented Jul 8, 2021

It seems to be related to multiprocess workers of gunicorn server (I used the Docker image: https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker).

It works fine, when setting environment variable MAX_WORKERS="1" for the FastAPI container.

Some instructions on how to solve it can be found here: https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn

But I don't know how the solution can be implemented in this tool. If someone managed to get it working I would be happy for help :)

@saschnet
Copy link

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.

To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

  1. Add provision of multiprocess registry in gunicorn.conf (add the following to the default):
from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

  1. Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc
  2. Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:
#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc

This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.

@neilferreira
Copy link

@saschnet FWIW it looks like this project supports multiprocess collection by simply setting the "prometheus_multiproc_dir" environment variable.

https://github.com/trallnag/prometheus-fastapi-instrumentator/blame/master/prometheus_fastapi_instrumentator/instrumentation.py#L257-L267

@nazzour
Copy link

nazzour commented Dec 27, 2021

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.

To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

  1. Add provision of multiprocess registry in gunicorn.conf (add the following to the default):
from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

  1. Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc
  2. Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:
#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc

This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.

Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please ? Thanks

@neilferreira
Copy link

Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please

If you visit your /metrics page, does it look like this?

# HELP foo_http_requests_total Multiprocess metric
# TYPE foo_http_requests_total counter

Importantly, indicating that it is using the Multiprocess metric?

If not, can you confirm if you're setting the prometheus_multiproc_dir environment variable and that the directory exists on your server/computer? If you have the means to do so, you can drop some debug statements into this chunk of code to determine what is going on https://github.com/trallnag/prometheus-fastapi-instrumentator/blame/master/prometheus_fastapi_instrumentator/instrumentation.py#L257

@IWillPull
Copy link

modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

@saschnet could you elaborate more on this?

@saschnet
Copy link

modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

@saschnet could you elaborate more on this?

I only published the endpoint to a different port as explained so far. But I think simply exposing the endpoint as described in the documentation should be sufficient: https://github.com/trallnag/prometheus-fastapi-instrumentator#exposing-endpoint

Have you tried that yet?

@Pazzeo
Copy link

Pazzeo commented Jan 16, 2023

Even though @sterres basically mentions all resources to solve this issue, it took me quite some time to do so myself and I want to share how I managed to get the fastapi instrumentator running on the gunicorn server with the Docker image from https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker.
To be able to get reasonable data from gunicorn with more than one workers and an indiviual metrics port do the following:

  1. Add provision of multiprocess registry in gunicorn.conf (add the following to the default):
from prometheus_client import start_http_server, multiprocess, CollectorRegistry

def when_ready(server):
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    start_http_server(METRICS_PORT, registry=registry)

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

If you do not need your own port for the metrics, remove the start_http_server and modify the code that to instumentator publishes the data of the multiprocess collector in the main app.

  1. Add an environmental variable: e.g. PROMETHEUS_MULTIPROC_DIR=/tmp_multiproc
  2. Make sure to create an empty directory for the temporal directory. With this specific container, use the prestart.sh script:
#! /usr/bin/env bash
if [ -d /tmp_multiproc ]; then rm -Rf /tmp_multiproc; fi
mkdir /tmp_multiproc

This script removes the directory if already in place and recreates it. Deleting is necessary, as container restarts fail otherwise.

Hello, I am having the same issue, I am running my python app using gunicorn and the metrics are really very strange. I have followd your solution (except that I commented the 'start_http_server" line) but tit did not work. any idea please ? Thanks

Hello, Sorry to come back to this issue but I'm trying to follow your indication to setup a different port of metrics but it seems do not work. Could you please help me? I'm using the same Docker image.

Thanks
Paz

@trallnag
Copy link
Owner

Fixed in #42 / #217

andreaskoepf added a commit to LAION-AI/Open-Assistant that referenced this issue Apr 28, 2023
We recently upgraded to Gunicorn for inference (multi-proc), which broke
our prometheus stats and we got strange drops (see
trallnag/prometheus-fastapi-instrumentator#50
). For multi process model setting `PROMETHEUS_MULTIPROC_DIR` is
required, see
[here](https://github.com/prometheus/client_python/blob/master/README.md#multiprocess-mode-eg-gunicorn).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants