-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't load global ordinals with the map
execution_hint
#37833
Conversation
Pinging @elastic/es-analytics-geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a question about maxOrd = -1
, but otherwise LGTM!
if (executionMode == ExecutionMode.MAP) { | ||
// global ordinals are not requested so we don't load them | ||
// and return the biggest cardinality per segment instead. | ||
long maxOrd = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed below there's an assert for -1 down below, and in production with assertions off that'd fall through to a ratio
that is under the threshold and triggers a LowCardinality
strategy (potentially).
Is this just a sanity assertion, or is there an actual chance of having "no cardinality" in each segment and actually returning -1? What I'm basically asking is if we should handle -1 explicitly below or not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion is only for the global_ordinals execution and in such case we always call ValuesSource.Bytes.WithOrdinals#globalMaxOrd
that returns values greater or equals than 0 so I think we're good here.
* master: (36 commits) Ensure joda compatibility in custom date formats (elastic#38171) Do not compute cardinality if the `terms` execution mode does not use `global_ordinals` (elastic#38169) Do not set timeout for IndexRequests in GatewayIndexStateIT (elastic#38147) Zen2ify testMasterFailoverDuringIndexingWithMappingChanges (elastic#38178) SQL: [Docs] Add limitation for aggregate functions on scalars (elastic#38186) Add elasticsearch-node detach-cluster command (elastic#37979) Add tests for fractional epoch parsing (elastic#38162) Enable bw tests for elastic#37871 and elastic#38032. (elastic#38167) Clear send behavior rule in CloseWhileRelocatingShardsIT (elastic#38159) Fix testCorruptedIndex (elastic#38161) Add finalReduce flag to SearchRequest (elastic#38104) Forbid negative field boosts in analyzed queries (elastic#37930) Remove AtomiFieldData#getLegacyFieldValues (elastic#38087) Universal cluster bootstrap method for tests with autoMinMasterNodes=false (elastic#38038) Fix FullClusterRestartIT.testHistoryUUIDIsAdded (elastic#38098) Replace joda time in ingest-common module (elastic#38088) Fix eclipse config for ssl-config (elastic#38096) Don't load global ordinals with the `map` execution_hint (elastic#37833) Relax fault detector in some disruption tests (elastic#38101) Fix java time epoch date formatters (elastic#37829) ...
The
terms
aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first
ordepth_first
). However this should be avoided if theexecution_hint
is explicitly set tomap
since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first
ordepth_first
). This estimation is only used if the execution hint is set tomap
, otherwise the global ordinals are still used to retrieve the accurate cardinality.Closes #37705