-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate metric events in Telemetry endpoint #536
Comments
Thanks for reporting this @eliasbrange. Initially the behavior you're describing doesn't sound like what I'd expect on a Prom formatted endpoint. I would expect a single metric counter to increase monotonically and not output multiple lines or counters with a single event value. Could you paste a snippet of everything else being logged around when this starts (I don't need all the lines, obviously)? Is it just a single metric named A test configuration/setup is always helpful as well and helps me debug quicker but I'm quite sure I can reproduce this. Sounds like I only need to push metrics into ContainerPilot. |
I guess the config doesn't actually matter. And defining a metric in the config does not matter either I think. Anyways, here comes a little config.
Then I run the following commands with the following output:
As you can see, each call to |
Thank you, your configs definitely helped because I wasn't seeing the issue immediately until I tried your setup. So yeah, you've found a bug and possibly a race condition because hammering the endpoint using a for loop causes multiple events to be created while sending them slower increments an existing event. Notice below that
|
Looks like everything is working fine except logging. We're logging each unique counter event by value. This is especially prevalent when counting in varying increments which I didn't notice before.
|
We're reporting events through our event bus Prometheus collector. It looks like I can remove the value portion of the event name when collecting Another option would be to stop recording Metric events all together since end users use their Prometheus counter values themselves. Why do we need duplicate information through the endpoint? Do we really need to know about each Metric event if the counter or gauge is what we're ultimately deriving value out of? Any thoughts? Going to think about this. |
After adding a single if statement around Metric collection, I'm feeling good about option 2 and removing Metric event collection all together. In hindsight if feels a bit overzealous. Also, I don't believe we had many consumers of the telemetry endpoint in that regard so it's probably safe to remove those events. |
Thanks for the quick iterations on this one @cheapRoc. Much appreciated. |
Just released in version 3.6.1, thanks again for the report! |
We use
containerpilot -putmetric
to gather certain fluctuating metrics such as memory usage and free disk space. We noticed that after running a container in production for some time the/metrics
endpoint generates over 100k lines with lines on the formSince the metrics are fluctuating this means that we get a few new events every 15 seconds or so. My guess is that this will continue indefinitely until the container is restarted. For counters that can only ever go up this also means a new event line every time you do
putmetric
.My question is, is this expected behaviour? Filtering in e.g.
InfluxDB
onsource
is now impossible, and the endpoint is getting slower and slower due to the massive amount of events that are returned.Edit: I have experimented with version
3.4.2
,3.5.0
and3.6.0
and they seem to behave the same.The text was updated successfully, but these errors were encountered: