Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate of increase for monotonic counter #60619

Closed
wylieconlon opened this issue Aug 3, 2020 · 14 comments
Closed

Rate of increase for monotonic counter #60619

wylieconlon opened this issue Aug 3, 2020 · 14 comments
Assignees
Labels
:Analytics/Aggregations Aggregations >feature stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@wylieconlon
Copy link

Elasticsearch should provide a new metric aggregation for use only in date histograms, which is able to calculate the increase in a monotonic counter. Because the value of a counter is always increasing, it occasionally resets from the maximum value to 0. These resets should be handled automatically by the aggregation. This aggregation requires documents to be sorted in increasing time order.

This aggregation should throw an error if values aren't monotonically increasing. The most common reason for this will be multiple sources of documents, such as multiple servers with separate counters. The error message should indicate to the user to add another bucket aggregation such as terms of host.name.

The aggregation should also allow scaling to a time unit like the derivative pipeline aggregation.

Use cases for this already exist in most beats modules. For example, system.network.in.bytes is a counter-type field that will generally be converted into a "rate per second."

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@wylieconlon
Copy link
Author

I've created a mockup to show that the rate should be calculated by looking sequentially at individual documents. If the counter resets to 0 in the middle of a bucket, the rate shouldn't be affected.

positive rate

@dgieselaar
Copy link
Member

Adding our use case: In APM we use a combination of a max, derivative, and bucket_script aggregations to display a monotonically increasing counter for our garbage collection charts.

image

{
  aggs: {
    over_time: {
      date_histogram: getMetricsDateHistogramParams(start, end),
      aggs: {
        // get the max value
        max: {
          max: {
            field: fieldName,
          },
        },
        // get the derivative, which is the delta y
        derivative: {
          derivative: {
            buckets_path: 'max',
          },
        },
        // if a gc counter is reset, the delta will be >0 and
        // needs to be excluded
        value: {
          bucket_script: {
            buckets_path: { value: 'derivative' },
            script: 'params.value > 0.0 ? params.value : 0.0',
          },
        },
      },
    },
  }
}

@wylieconlon
Copy link
Author

wylieconlon commented Aug 12, 2020

@dgieselaar The main reason to ask for a new metric aggregation in Elasticsearch is to avoid the edge cases with the approach you've described:

  • In the diagram I drew above, the Max of field is highest right around the transition from Day 1 -> Day 2. Therefore the rate would actually be wrong when calculated by using max (the rate could ignore 23 hours of the day)
  • In the case of a metric that is tracked from multiple sources, each counter would have different values. So by using Max of field you'd only see the value of one of the counters, not all of the counters. Imagine that you have 10 servers and calculate the rate based on Max of network.in.bytes- this number would be off by 10x.
  • Finally, because you're using Max as the metric, the bucket after the counter reset will always be 0. This is because the counter could reset on Day 2, and then the Max of Day 2 is higher than the Max of Day 3.

Here's an example of three separate counters that might have different resets:

positive rate

Based on this example, I would expect that:

  • If the rate of increase were requested for all three counters together, I would expect an error because the values aren't always increasing
  • If the rate of increase were requested for all three counter separately, it should return the results on the right instead of zeroes

@imotov imotov self-assigned this Aug 31, 2020
@not-napoleon
Copy link
Member

Hey, I just want to check in on the requirements here. I spoke a bit with @wylieconlon , and he suggested I tag @exekias for input as well. Here's a couple of scenarios I'm looking at, and would like your feedback on. In all examples, I'm showing data as pairs of numbers, with the first representing a time and the second representing the counter value (I'm assuming it's in bytes of network data, just to have some unit to talk about). For ease of typing, I'm writing time in seconds from some nominal T=0 which will be the start of our observations. Obviously in a real application these would be milliseconds since epoch timestamps.

Simple case: (0, 1000), (10, 1100), (20, 1200), (30, 1300), (40, 1400), (50, 1500), (60, 1600)

In this case, we have a total of 600 bytes over 60 seconds, for a rate of 10 bytes / second, assuming a 1 minute bucket.

Spike case: (0, 1000), (10, 1000), (20, 1000), (30, 1000), (40, 1600), (50, 1600), (60, 1600)
In this case, there's a lull in traffic followed by a spike at 40 seconds, but for the whole minute bucket, we still transferred 600 bytes in 60 seconds, for a rate of 10 bytes / second.

Reset case 1: (0, 1000), (10, 1100), (20, 100), (30, 200), (40, 300), (50, 400), (60, 500)
This gets a little trickier, but I still think it's describing a 10 bytes / second rate. From 0 to 10, we observe 10 bytes / second. We don't know what happened between 10 and 20, because there's a reset discontinuity, so we ignore that block. Then from 20 seconds to 60 seconds, we observe 400 bytes in 40 seconds, still a 10 bytes / second rate

Reset case 2: (0, 2^32 - 1000200), (10, 2^32 - 1000100), (20, 100), (30, 200), (40, 300), (50, 400), (60, 500)
I'm including this case because I've heard from a couple of folks "2^32 - 1000000 followed by 100, the rate is 1000100", but to me this is the same as the above example. You can interpret that sequence of data as "there was a spike where we shipped a megabyte in 10 seconds and rolled over the counter" or you can interpret it as "the monitoring agent got reset in that window". We don't know which happened, and there isn't anything in the data to tell us. To my mind, the only not wrong thing we can do is ignore that interval.

Thanks in advance for your input.

@exekias
Copy link

exekias commented Sep 18, 2020

I see the point about not interpreting that counter reached MAX_INT when we have a reset, but we still have some information after reset:

For case 1: (0, 1000), (10, 1100), (20, 100), (30, 200), (40, 300), (50, 400), (60, 500)

at time 20 you detect the counter reset, ignoring it would mean reporting a rate of 0 here (?). Still we know that it raised by at least 100 since the previous sample. So you could "interpret" the data as this, making no assumptions on the max number that was reached before reset:

(0, 1000), (10, 1100), (20, 1100+100), (30, 1100+200), (40, 1100+300), (50, 1100+400), (60, 1100+500)

For case 2: (0, 2^32 - 1000200), (10, 2^32 - 1000100), (20, 100), (30, 200), (40, 300), (50, 400), (60, 500)
This would be:

(0, 2^32 - 1000200), (10, 2^32 - 1000100), (20, 2^32 - 1000100 + 100), (30, 2^32 - 1000100 + 200), (40, 2^32 - 1000100+ 300), (50, 2^32 - 1000100 + 400), (60, 2^32 - 1000100 + 500)

This, when compared to taking the positive values only at least takes into account the data we have just after a counter reset, which may of course be incomplete, but better than filling with a 0.

It would be good to also try to think about these scenarios when samples are split in several buckets, for instance, 10s bucket would leave you with one value per bucket. I understand this aggregation would take the value from the previous bucket into account when calculating the rate for the next one?

@wylieconlon
Copy link
Author

@exekias I think you are describing the following algorithms in pseudocode.

  1. When the counter decreases, ignore the possibility of an overflow and just use the new value
rate = 0
loop_over_values(lambda (current, previous):
  if current >= previous:
    rate = rate + (current - previous)
  else:
    rate = rate + current
)

This is different from the other algorithms that we could use, which are:

  1. Try to determine the overflow amount when the counter decreases, by using `rate = rate + (sys.maxint - previous) + current

  2. Treat any reset as if it's adding 0, by keeping rate the same.

If we had perfect information, I think 2 would be correct most of the time. But I see your point that without perfect information, 1 might be the closest.

There is an edge case that happens pretty often in Metricbeat data, which I want to add here. If the user is requesting the positive rate of a field that is coming from multiple counters, none of these algorithms will catch this and all will produce crazy results:

Time Source Value Rate algorithm 1 Rate algorithm 2 Rate algorithm 3
0 A 9000 9000 9000 9000
1 B 100 9100 2^32 - 8900 9000
2 A 9100 18200 200 18000

The question I have for all of you is: are we okay with the potential for user errors here? I think it could go both ways.

@matschaffer
Copy link
Contributor

Any new year updates on this? Just had it pop up again in a troubleshooting session regarding a graph of node_stats.os.cgroup.cpuacct.usage_nanos in .monitoring-es-* from stack monitoring.

Given our own components have a lot of monotonic counters, it'd be great to have better support for them in ES.

For the time being we can get by with TSVB's "Positive Rate" and a 1k "top" value at least.

@imotov
Copy link
Contributor

imotov commented Feb 22, 2021

@matschaffer It turned out to be much trickier than we originally thought. The main challenge here is scalability. In the current framework, the data that we get is not sorted and distributed across multiple shards. So, in general the issue is not really solvable unless we ship all data to a coordinating node and sort it there unless we change the way we store the data or come up with some heuristic approach.

@matschaffer
Copy link
Contributor

That’s unfortunate, but understandable.

Should we pursue something at another layer maybe? (Roll up counters to gauges for example)

The stack itself produces a many counters today and we’ll probably get more over time (thinking especially about cases like apm agent collecting Prometheus counters).

@jasontedor or @tbragin any thoughts on how we should proceed?

@imotov
Copy link
Contributor

imotov commented Feb 25, 2021

@matschaffer we understand the importance of this feature and it is still high on our priority list. One of the ideas that we wanted to prototype is timestamps sorted indices routed by the counter id. That would allow us to resolve some of the issues mentioned above. There is still an issues of index rollover, but somewhat smaller one if we can ensure that the earliest data point in the later index is always after the latest data point in the earlier index, which would also require some sort of routing mechanism that we don't have at the moment. Another approach that we discussed was to create some sort of streaming API producing sorted data to clients, again tricky and not ideal since each client will have to do their own implementation and we will not be able to wrap it into other aggregations, etc.

@matschaffer
Copy link
Contributor

Yeah, streaming API to clients sounds like it could be tricky to build visualizations and alerts on (which would be the end usage for many of these counters). Thanks for confirming the priority. Hopefully we can find some path forward.

We'll keep our top:N graphs and alerts tuned high (at least 1k) in the mean time to help avoid fresh counters getting overlooked.

@wchaparro
Copy link
Member

#74660

@martijnvg
Copy link
Member

I think the rate agg on counter field implements what is being asked here.
The rate agg on counter field for tsdb will be in tech preview in 8.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >feature stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

10 participants