-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate aggregation doesn't always use the right bucket size #63703
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
I have been investigating this for a few weeks, and would like to add an update and propose some ways forward. The problem: As you can see from the comment in the original issue in order to correctly calculate the rate we sometimes need to know the actual start value for the bucket from which we can deduce the bucket size. When I implemented the rate aggregation I thought I have a solution, but it turned out the method I found didn't really work. In the situation when we have some intermediate aggregations between date_histogram and rate aggregation it's not possible to determine the current size of the bucket that we calculate the rate for. In other words all numbers marked as QT in the comment linked above are bogus or can cause an exception. Just to clarify it's not an issue if date_histogram is a direct parent of the rate histogram. The solution: I don't have any good solutions at the moment. Workarounds: since the real solution might take a while, we would like to figure out what would be the best workaround for variable bucket sizes for grandparents. There are 2 possible ways to approach it:
|
I'd prefer an exception in this case. FWIW, we have not yet started to use the rate aggregation due to missing histogram support, but when we will use it, it will likely use fixed_interval and be a direct child of the date_histogram aggregation. |
some user input from internal analytics: more inclined to agree with option #1 in which an exception is thrown and the onus is put on the user to define the fixed interval |
I was hacking on something vaguely related and ran into problems with the range aggregation calling
bucketSize
with a bucket that has never been collected. The "keyword sandwitch" test (yum) does this. It'll passbucket=2
when the rate only contains 2 buckets. I think it is picking up the bucket ordinal from its parent agg - a terms agg.The text was updated successfully, but these errors were encountered: