-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove aggregation's postCollect phase #64016
Remove aggregation's postCollect phase #64016
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
After elastic#63811 it became clear to me that `postCollect` is kind of dangerous and not all that useful. So this removes it. The trouble with `postCollect` is that it all happened right after we finished calling `collect` on the `LeafBucketCollectors` but before we built the aggregation results. But in elastic#63811 we found out that we can't call `postCollect` on the children of `parent` or `child` aggregators until we know which *which* aggregation results we're building. So this removes `postCollect` and moves all of the things we did at post-collect phase into `buildAggregations` or into hooks called in those methods.
|
||
@Override | ||
protected void beforeBuildingBuckets(long[] ordsToCollect) throws IOException { | ||
protected void prepareSubAggs(long[] bucketOrdsToCollect) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this method to make it a bit more clear what it is for.
* collect from this aggregation | ||
* @return the results for each ordinal, in the same order as the array | ||
* of ordinals | ||
*/ | ||
public abstract InternalAggregation[] buildAggregations(long[] owningBucketOrds) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a more clear variable name. I made this change in a dead end when I was working on this change locally but I kind of like it.
if (finished == false) { | ||
throw new IllegalStateException("Cannot replay yet, collection is not finished: postCollect() has not been called"); | ||
} | ||
finishLeaf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty happy about how this turned out! No more finished
to check on. It's just all built in now.
@@ -124,17 +124,6 @@ public void preCollection() throws IOException { | |||
profiler.pollLastElement(); | |||
} | |||
|
|||
@Override | |||
public void postCollection() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of an unfortunate loss here. Now we'll end up having post_collection
in the build_aggregation
debug phase. I'm wondering if I can have aggregations that do "interesting" things opt in to make their own timers or something.
d588dda
to
93db739
Compare
@cjcenizal is this something that you need to look at as well? It removes one of the entries in the aggs profile list because that line was "scary". |
@nik9000 Thanks for the ping! Which API does this affect, and what is the change? Is it in the request or the response? Could you provide an example before and after? |
Just the response. Have a look at the asciidoc file this PR changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like ES UI depends on the post_collection
or post_collection_count
fields which are removed from the profile
response in this PR.
@cjcenizal would the UI be ok if I added fields to that list? Renamed them? I've been thinking about having a little more flexibility there. |
run elasticsearch-ci/bwc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After elastic#63811 it became clear to me that `postCollect` is kind of dangerous and not all that useful. So this removes it. The trouble with `postCollect` is that it all happened right after we finished calling `collect` on the `LeafBucketCollectors` but before we built the aggregation results. But in elastic#63811 we found out that we can't call `postCollect` on the children of `parent` or `child` aggregators until we know which *which* aggregation results we're building. So this removes `postCollect` and moves all of the things we did at post-collect phase into `buildAggregations` or into hooks called in those methods.
After #63811 it became clear to me that `postCollect` is kind of dangerous and not all that useful. So this removes it. The trouble with `postCollect` is that it all happened right after we finished calling `collect` on the `LeafBucketCollectors` but before we built the aggregation results. But in #63811 we found out that we can't call `postCollect` on the children of `parent` or `child` aggregators until we know which *which* aggregation results we're building. So this removes `postCollect` and moves all of the things we did at post-collect phase into `buildAggregations` or into hooks called in those methods.
In #64016 I caused a bug in rare_terms where we would rewrite the current leaf and remove all hits from it, causing us to trip an assertion. This didn't happen before because previously we never rewrote the current leaf. The fix involves cleaning up the state that the deferring collector uses if we end up removing all hits. This requires some super intimate knowledge of how `BestBucketsDeferringCollector` works so I decided to move the implementation of that mergin from `MergingBucketsDeferringCollector` into `BestBucketsDeferringCollector`. This means that `MergingBucketsDeferringCollector` is pretty much empty now. So I've deprecated it and will remove it entirely in an subsequent change. Closes #64356
From what I can see, the code builds the UI recursively and dynamically, following the structure of the Profile API response but agnostic to its contents. So you should be able to add, remove, and rename fields freely and I expect the UI to faithfully surface them. |
In elastic#64016 I caused a bug in rare_terms where we would rewrite the current leaf and remove all hits from it, causing us to trip an assertion. This didn't happen before because previously we never rewrote the current leaf. The fix involves cleaning up the state that the deferring collector uses if we end up removing all hits. This requires some super intimate knowledge of how `BestBucketsDeferringCollector` works so I decided to move the implementation of that mergin from `MergingBucketsDeferringCollector` into `BestBucketsDeferringCollector`. This means that `MergingBucketsDeferringCollector` is pretty much empty now. So I've deprecated it and will remove it entirely in an subsequent change. Closes elastic#64356
🤘 That means if an aggs has an "unusual" phase it can add it rather than tucking it into the other phases. That seems useful! |
In #64016 I caused a bug in rare_terms where we would rewrite the current leaf and remove all hits from it, causing us to trip an assertion. This didn't happen before because previously we never rewrote the current leaf. The fix involves cleaning up the state that the deferring collector uses if we end up removing all hits. This requires some super intimate knowledge of how `BestBucketsDeferringCollector` works so I decided to move the implementation of that mergin from `MergingBucketsDeferringCollector` into `BestBucketsDeferringCollector`. This means that `MergingBucketsDeferringCollector` is pretty much empty now. So I've deprecated it and will remove it entirely in an subsequent change. Closes #64356
The cardinality agg delays calculating stuff until just before it is needed. Before elastic#64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After elastic#64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes elastic#67782
The cardinality agg delays calculating stuff until just before it is needed. Before #64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After #64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes #67782
The cardinality agg delays calculating stuff until just before it is needed. Before elastic#64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After elastic#64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes elastic#67782
The cardinality agg delays calculating stuff until just before it is needed. Before elastic#64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After elastic#64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes elastic#67782
The cardinality agg delays calculating stuff until just before it is needed. Before #64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After #64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes #67782
The cardinality agg delays calculating stuff until just before it is needed. Before #64016 it used the `postCollect` phase to do this work which was perfect for the `terms` agg but we decided that `postCollect` was dangerous because some aggs, notably the `parent` and `child` aggs need to know which children to build and they *can't* during `postCollect`. After #64016 we built the cardinality agg results when we built the buckets. But we if you sort on the cardinality agg then you need to do the `postCollect` stuff in order to know which buckets to build! So you have a chicken and egg problem. Sort of. This change splits the difference by running the delayed cardinality agg stuff as soon as you *either* try to build the buckets *or* read the cardinality for use with sorting. This works, but is a little janky and feels wrong. It feels like we could make a structural fix to the way we read metric values from aggs before building the buckets that would make this sort of bug much more difficult to cause. But any sort of solution to this is a larger structural change. So this fixes the bug in the quick and janky way and we hope to do a more structural fix to the way we read metrics soon. Closes #67782
This partially reverts elastic#64016 and and adds elastic#67839 and adds additional tests that would have caught issues with the changes in elastic#64016. It's mostly Nik's code, I am just cleaning things up a bit. Co-authored-by: Nik Everett nik9000@gmail.com
After #63811 it became clear to me that
postCollect
is kind ofdangerous and not all that useful. So this removes it.
The trouble with
postCollect
is that it all happened right after wefinished calling
collect
on theLeafBucketCollectors
but before webuilt the aggregation results. But in #63811 we found out that we can't
call
postCollect
on the children ofparent
orchild
aggregatorsuntil we know which which aggregation results we're building.
So this removes
postCollect
and moves all of the things we did atpost-collect phase into
buildAggregations
or into hooks called inthose methods.