Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove buckets_path from Bucket Selector Aggregation and Expose Buckets to Script? #32189

Closed
original-brownbear opened this issue Jul 19, 2018 · 4 comments
Labels
:Analytics/Aggregations Aggregations stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) team-discuss

Comments

@original-brownbear
Copy link
Member

This is coming from #32068 (comment)

Quote @rjernst :


Putting the bucket accessors inside params is how we currently do this, but I don't think it is how we should do this long term. Params should be solely for user provided params (they should be read only, since they are parsed once and should be the same across all documents). Can you please open an issue to discuss with the search/aggs team how these buckets should be accessible in the future? With contexts we could potentially not use buckets_path at all, and have a buckets variable available in the script. eg:

Current:

"bucket_selector": {
    "buckets_path": {
        "foobar": "foo>bar"
      },
      "script": "params.foobar > 200"
}

Potential new script use:

"bucket_selector": {
    "script": "buckets['foo']['bar'].value > 200"
}

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@polyfractal
Copy link
Contributor

We chatted about this a bit in the S&A meeting, forgot to update the ticket. Sorry about that!

@original-brownbear @rjernst We liked the syntax. It does introduce a bit of inconsistency with the other pipeline aggs since they will still need a buckets_path (e.g. a derivative), but we decided we're ok with that because the syntax for scripting is so much cleaner/intuitive than before.

@colings86 did think of a potential, technical issue though. In addition to providing values to the script, we use buckets_path to resolve the order that pipeline aggregations should be executed in. One pipeline agg can refer to another (in it's bucket_path) so we have to make sure they are executed in the right order.

From our conversations, it sounds like painless would want to resolve these values lazily (e.g. when the script is actually executing) which would complicate the execution order issue.

@rjernst
Copy link
Member

rjernst commented Aug 10, 2018

Yes, painless would resolve these lazily. However, it is possible this could be handled in the future by the multi pass compiler, which (at least as planned right now at a high level), would allow introspecting the compiled script. For example, we want this so we can find at compile time which fields are going to be accessed for doc values, ahead of trying to run the script. Something similar could be done for bucket_path. Since these ordering constraints look critical to correctness, I will mark this as stalled and wait on the painless changes. /cc @jdconrad

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@polyfractal polyfractal removed their assignment Jun 26, 2020
@nik9000
Copy link
Member

nik9000 commented Jun 29, 2022

We do have the infrastructure to do this sort of thing now. Extra compiler passes and stuff. But since no one has poked at this in years and years I don't think this is worth doing. It's better, for sure. But it's a pain to make backwards breaking changes and probably not worth the time now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) team-discuss
Projects
None yet
Development

No branches or pull requests

9 participants