PushController.tick keeps GIL for the time depeding on the amount of lables and metrics #1142

haarakdy · 2022-06-21T08:26:17Z

I am using PrometheusRemoteWriteMetricsExporter and Logz.io
In my tests OpenTelemetry's tick() keeps GIL for up to 10ms when the number of metrics is 20, and average number of labels is 6
I measure the GIL availability using a loop like in the class below. The py-spy record ... confirms the result.
In my application I have a constraint for the response latency. The constraint is not very tight. I have a budget of few 100s of milliseconds. OpenTelemetry's tick is not the only loop in the system. If I use more than a few metrics I start to see the impact.

class Ping:
    def __init__(self, sleep_time) -> None:
        # A sorted list of a few samples of the worst case
        # thread shcheduler latency
        self.max_latencies: List[float] = 10 * [0.0]
        # A sum of all latencies for mean calculations
        self.latency_accumulator = 0.0
        # Number of tests
        self.count = 0
        self.completed = False
        self.sleep_time = sleep_time

        # Start test
        self.task_ping = Thread(target=self._ping)
        self.task_ping.daemon = True
        self.task_ping.start()

    def complete(self):
        self.completed = True
        self.task_ping.join()

    @property
    def max_latency(self) -> float:
        return self.max_latencies[-1]

    @property
    def mean_latency(self) -> float:
        return self.latency_accumulator / self.count

    def _ping(self) -> None:
        """
        Call sleep(), measure the actual time the thread spends sleeping,
        collect the latency, the time slippage.
        """
        before = time()
        while not self.completed:
            sleep(self.sleep_time)

            # In the ideal preemptive scheduling the time spent sleeping
            # is equal to the argument. In a real system the thread scheduler
            # has latency. In the Python environment of cooperative
            # multithreading the latency is impacted by threads keeping GIL.
            after = time()
            latency = after - before - self.sleep_time
            # Keep a few worst case latencies
            if latency > self.max_latencies[0]:
                # replace the lowest value
                self.max_latencies[0] = latency
                self.max_latencies.sort()
            self.latency_accumulator += latency
            self.count += 1
            before = after

This patch solves the problem

diff --git a/exporter/prometheus_remote_write/__init__.py b/exporter/prometheus_remote_write/__init__.py
index 1a75ea8..7bc47cd 100644
--- a/exporter/prometheus_remote_write/__init__.py
+++ b/exporter/prometheus_remote_write/__init__.py
@@ -14,6 +14,7 @@
 
 import logging
 import re
+from itertools import islice
 from typing import Dict, Sequence
 
 import requests
@@ -181,7 +182,7 @@ class PrometheusRemoteWriteMetricsExporter(MetricsExporter):
         self, export_records: Sequence[ExportRecord]
     ) -> Sequence[TimeSeries]:
         timeseries = []
-        for export_record in export_records:
+        for export_record in islice(export_records, 10):
             aggregator_type = type(export_record.aggregator)
             converter = self.converter_map.get(aggregator_type)
             if converter:

The text was updated successfully, but these errors were encountered:

haarakdy · 2022-06-21T12:19:02Z

I can quickly reproduce the problem by generating random values labels in the metric. Why does the exporter performance depends on the number of different values of a label?
The memory in use grows when the list of possible values for labels is expanding.

ocelotl · 2022-06-21T17:43:19Z

@haarakdy you are using the old prometheus write exporter, we have to bring it back to use the new metrics API/SDK, this is being tracked in #933.

haarakdy · 2022-07-05T13:58:32Z

Is there a simple way to run the OpenTelemetry in a dedicated Python instance, a dedicated process? Ultimately I want the UpDownCounter.add() API to copy the data to a shared memory, send an event, and return. OpenTelemetry process picks the data from the shared memory and performs it's magic.

srikanthccv · 2022-09-09T17:35:01Z

@haarakdy you are using the old exporter that's no longer supported/maintained. please track the progress here #933.

Is there a simple way to run the OpenTelemetry in a dedicated Python instance, a dedicated process? Ultimately I want the UpDownCounter.add() API to copy the data to a shared memory, send an event, and return. OpenTelemetry process picks the data from the shared memory and performs it's magic.

Not sure what you are referring to here. Please create another feature-request issue to start the discussion.

haarakdy added the bug Something isn't working label Jun 21, 2022

haarakdy mentioned this issue Jun 21, 2022

PushController.tick keeps GIL for the time depeding on the amount of lables and metrics open-telemetry/opentelemetry-collector-contrib#11155

Closed

haarakdy mentioned this issue Jul 4, 2022

Restore prometheus remote write exporter #933

Closed

srikanthccv closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PushController.tick keeps GIL for the time depeding on the amount of lables and metrics #1142

PushController.tick keeps GIL for the time depeding on the amount of lables and metrics #1142

haarakdy commented Jun 21, 2022

haarakdy commented Jun 21, 2022 •

edited

Loading

ocelotl commented Jun 21, 2022

haarakdy commented Jul 5, 2022 •

edited

Loading

srikanthccv commented Sep 9, 2022

PushController.tick keeps GIL for the time depeding on the amount of lables and metrics #1142

PushController.tick keeps GIL for the time depeding on the amount of lables and metrics #1142

Comments

haarakdy commented Jun 21, 2022

haarakdy commented Jun 21, 2022 • edited Loading

ocelotl commented Jun 21, 2022

haarakdy commented Jul 5, 2022 • edited Loading

srikanthccv commented Sep 9, 2022

haarakdy commented Jun 21, 2022 •

edited

Loading

haarakdy commented Jul 5, 2022 •

edited

Loading