Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCRIPTING: Move Aggregation Scripts to their own context #32068

Conversation

original-brownbear
Copy link
Member

Same as #32003 in motivation, handling another step of the todo.

Also, handled another todo in BucketSelectorPipelineAggregator by moving params from being a field on the compiled script to being an input so that the new type BucketAggregationScript is stateless and can be reused in loops.

@original-brownbear original-brownbear added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Jul 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@original-brownbear
Copy link
Member Author

@rjernst can you take a look here when you have some time?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about taking this approach. Part of the goal of moving to script contexts is to solidify the api of what is accessible and returned by each script invocation.


public static final ScriptContext<Factory> CONTEXT = new ScriptContext<>("aggs_executable", Factory.class);

public abstract Object execute(Map<String, Object> params);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of adding script contexts was to avoid having generic object returns like this that must be cast. It is especially relevant in numeric processing, as boxing/unboxing is not free. I think we actually want aggregation script classes for each different aggregation use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst makes sense :) I was wondering whether or not that (compiling for a specific return type) would be possible with Painless (especially given this kind of thing #32134).
Thanks for answering :) => on it splitting this up some more.


public static final String[] PARAMETERS = { "params" };

public static final ScriptContext<Factory> CONTEXT = new ScriptContext<>("aggs_executable", Factory.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was called "executable" before only because of the legacy ExecutableScript that it used. We should have agg specific names for these scripts.

@original-brownbear
Copy link
Member Author

@rjernst alrighty split the context into 2 contexts for the double and the boolean return type. Let me know if this is what you were looking for :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start! I have some comments.

executableScript.setNextVar("_subset_size", subsetSizeHolder);
executableScript.setNextVar("_superset_freq", supersetDfHolder);
executableScript.setNextVar("_superset_size", supersetSizeHolder);
params.putAll(script.getParams());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be its own context. Putting these into params would be a breaking change, and also not utilize the intent of having contexts (different variables for different uses).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst but currently these are documented as params aren't they? See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#_scripted

Also under the hood org.elasticsearch.painless.ScriptImpl simply puts these under params, these aren't available as top level params are they?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I was confused because of what I saw in ScriptImpl for painless (naming implying it was in variables for the script, but that is actually the params). I still think this needs to be its own context. We can eventually move these to direct arguments of the execute method (again, so params can be read-only in the future).

Copy link
Member Author

@original-brownbear original-brownbear Jul 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst Made this a separate context now in a1bd389

Didn't add any further logic for making these direct parameters yet though or did we want to add that here already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be added as direct arguments in a separate PR. There should also be deprecation messages along with that so we can remove inserting them into params.

/**
* A script used in bucket aggregations that returns a {@code boolean} value.
*/
public abstract class BucketAggregateToBooleanScript {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is a little different than we use for other script classes. I suggestion BucketAggregationSelectorScript.

/**
* A script used in bucket aggregations that returns a {@code double} value.
*/
public abstract class BucketAggregateToDoubleScript {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest BucketAggregationScript for the name, to more closely match the style of other script class names.

Map<String, Object> vars = new HashMap<>();
if (script.getParams() != null) {
vars.putAll(script.getParams());
if ("expression".equals(script.getLang())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have any logic like this. Script uses should be agnostic to the script language implementation. Expressions need to handle returning a boolean value to satisfy this script's api.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst but that's a breaking change for existing expression scripts isn't it? (forgive me if I'm misunderstanding the way those work and that was a stupid question :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressions only know how to return doubles. I'm talking about the ExpressionScriptEngine, which is what implements each concrete class for a ScriptContext.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst ah got it :) thanks => On it adjusting this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in
0f13633, this is what we want right?

PS: Sorry for the split commits on this, the removal of the conditional by language is here:
bc27466

String varName = entry.getKey();
String bucketsPath = entry.getValue();
Double value = resolveBucketValue(originalAgg, bucket, bucketsPath, gapPolicy);
vars.put(varName, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the bucket accessors inside params is how we currently do this, but I don't think it is how we should do this long term. Params should be solely for user provided params (they should be read only, since they are parsed once and should be the same across all documents). Can you please open an issue to discuss with the search/aggs team how these buckets should be accessible in the future? With contexts we could potentially not use buckets_path at all, and have a buckets variable available in the script. eg:

Current:

"bucket_selector": {
    "buckets_path": {
        "foobar": "foo>bar"
      },
      "script": "params.foobar > 200"
}

Potential new script use:

"bucket_selector": {
    "script": "buckets['foo']['bar'].value > 200"
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #32189 :)

@original-brownbear
Copy link
Member Author

@rjernst thanks! All points addressed I think. Did the suggested renamings and commented on the other things.

@original-brownbear
Copy link
Member Author

@rjernst can you take another look here at the 2 open points? Thanks!

@original-brownbear
Copy link
Member Author

@rjernst this should be ready for another round of reviewing, can you take another look? :)

@original-brownbear
Copy link
Member Author

@rjernst ping if you have a sec :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last important thing about expressions.

executableScript.setNextVar("_subset_size", subsetSizeHolder);
executableScript.setNextVar("_superset_freq", supersetDfHolder);
executableScript.setNextVar("_superset_size", supersetSizeHolder);
params.putAll(script.getParams());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be added as direct arguments in a separate PR. There should also be deprecation messages along with that so we can remove inserting them into params.

return new BucketAggregationScript() {
@Override
public double execute(Map<String, Object> params) {
params.forEach((name, value) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In expressions, we want these types of decisions made up front, not at execution time. Take a look at how params are passed to the factory in other script examples, and exposed as getParams() on the script class. We want to avoid costly things like hash lookups per document for something that won't change (eg the params that exist).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst but these params aren't really params like in the other scripts. They aren't constant but change in a loop here https://github.com/elastic/elasticsearch/pull/32068/files#diff-15179b62bf7bf2f829791af279ea380aR90.

Previously those were part of a params field passed to the factory which made the script stateful and I guess led to this todo https://github.com/elastic/elasticsearch/pull/32068/files#diff-15179b62bf7bf2f829791af279ea380aL98 <= that I resolved here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that isn't really resolving that comment. While only one script object is created with your change, the performance is the same because we are still doing the binding at execution time for each document. I would rather keep that original code with that comment, deprecate passing things in via params (by adding the parameters directly as arguments), and then fix the TODO in master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst ah now I see. It still looks to me like the change here get's us to a better spot for existing scripts that still use the params here since:

  • We're saving the whole setting up of the array and hash map for function values in org.elasticsearch.script.expression.ExpressionExecutableScript#ExpressionExecutableScript
  • We get rid of the state in the script and the instantiating the script multiple times

Wouldn't it make more sense to keep this to improve the backwards compatible case and add a todo to avoid the redundant binding for the unchanged vars (and creating the map over and over) instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or put differently :)
If we don't move to passing the params as the execute argument, then we'll be forced to recreate the script over so long as we support passing variables to scripts via params won't we? => better move to more efficiently updating this map (and avoiding looping over it in expressions ideally) than adding the params field back and being forced to recreate the script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, the correct solution will have params in this still -> will revert to using params as on the script tomorrow :)

@original-brownbear
Copy link
Member Author

@rjernst this should be good for another review :) Sorry for the stupid back and forth here last night! Restored the behaviour of the params here and kept the todo as requested now.

@original-brownbear
Copy link
Member Author

@rjernst ping :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One last suggestion.

/**
* A script used in script heuristics.
*/
public abstract class ScriptHeuristicScript {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is confusing having "script" twice. Can we call it HeuristicScoreScript? Or better yet, something specific to the agg it is used in (significant terms).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SignificantTermsHeuristicScoreScript or too long? :)

Copy link
Member

@rjernst rjernst Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's long, but probably better in this case too not have any possible confusion with regular scoring scripts.

@original-brownbear
Copy link
Member Author

@rjernst thanks! Renamed -> will merge once green :)

@original-brownbear original-brownbear merged commit 6fa7016 into elastic:master Aug 4, 2018
@original-brownbear original-brownbear deleted the replace-agg-script-context branch August 4, 2018 08:37
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Aug 4, 2018
* SCRIPTING: Move Aggregation Scripts to their own context
original-brownbear added a commit that referenced this pull request Aug 4, 2018
…2629)

* SCRIPTING: Move Aggregation Scripts to their own context
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 6, 2018
…pe-detection-with-leading-whitespace

* elastic/master: (34 commits)
  Cross-cluster search: preserve cluster alias in shard failures (elastic#32608)
  Handle AlreadyClosedException when bumping primary term
  [TEST] Allow to run in FIPS JVM (elastic#32607)
  [Test] Add ckb to the list of unsupported languages (elastic#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (elastic#32068)
  Painless: Use LocalMethod Map For Lookup at Runtime (elastic#32599)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (elastic#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (elastic#32613)
  [Rollup] Update wire version check after backport
  Suppress Wildfly test in FIPS JVMs (elastic#32543)
  [Rollup] Improve ID scheme for rollup documents (elastic#32558)
  ingest: doc: move Dot Expander Processor doc to correct position (elastic#31743)
  [ML] Add some ML config classes to protocol library (elastic#32502)
  [TEST]Split transport verification mode none tests (elastic#32488)
  Core: Move helper date formatters over to java time (elastic#32504)
  [Rollup] Remove builders from DateHistogramGroupConfig (elastic#32555)
  [TEST} unmutes SearchAsyncActionTests and adds debugging info
  [ML] Add Detector config classes to protocol library (elastic#32495)
  [Rollup] Remove builders from MetricConfig (elastic#32536)
  ...
dnhatn added a commit that referenced this pull request Aug 6, 2018
* 6.x:
  [Kerberos] Use canonical host name (#32588)
  Cross-cluster search: preserve cluster alias in shard failures (#32608)
  [TEST] Allow to run in FIPS JVM (#32607)
  Handle AlreadyClosedException when bumping primary term
  [Test] Add ckb to the list of unsupported languages (#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (#32068) (#32629)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613)
  [Rollup] Improve ID scheme for rollup documents (#32558)
  Mutes failing SQL string function tests due to #32589
  Suppress Wildfly test in FIPS JVMs (#32543)
  Add cluster UUID to Cluster Stats API response (#32206)
  [ML] Add some ML config classes to protocol library (#32502)
  [TEST]Split transport verification mode none tests (#32488)
  [Rollup] Remove builders from DateHistogramGroupConfig (#32555)
  [ML] Add Detector config classes to protocol library (#32495)
  [Rollup] Remove builders from MetricConfig (#32536)
  Fix race between replica reset and primary promotion (#32442)
  HLRC: Move commercial clients from XPackClient (#32596)
  Security: move User to protocol project (#32367)
  Minor fix for javadoc (applicable for java 11). (#32573)
  Painless: Move Some Lookup Logic to PainlessLookup (#32565)
  Core: Minor size reduction for AbstractComponent (#32509)
  INGEST: Enable default pipelines (#32286) (#32591)
  TEST: Avoid merges in testSeqNoAndCheckpoints
  [Rollup] Remove builders from HistoGroupConfig (#32533)
  fixed elements in array of produced terms (#32519)
  Mutes ReindexFailureTests.searchFailure dues to #28053
  Mutes LicensingDocumentationIT due to #32580
  Remove the SATA controller from OpenSUSE box
  [ML] Rename JobProvider to JobResultsProvider (#32551)
dnhatn added a commit that referenced this pull request Aug 6, 2018
* master:
  Cross-cluster search: preserve cluster alias in shard failures (#32608)
  Handle AlreadyClosedException when bumping primary term
  [TEST] Allow to run in FIPS JVM (#32607)
  [Test] Add ckb to the list of unsupported languages (#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (#32068)
  Painless: Use LocalMethod Map For Lookup at Runtime (#32599)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613)
  [Rollup] Update wire version check after backport
  Suppress Wildfly test in FIPS JVMs (#32543)
  [Rollup] Improve ID scheme for rollup documents (#32558)
  ingest: doc: move Dot Expander Processor doc to correct position (#31743)
  [ML] Add some ML config classes to protocol library (#32502)
  [TEST]Split transport verification mode none tests (#32488)
  Core: Move helper date formatters over to java time (#32504)
  [Rollup] Remove builders from DateHistogramGroupConfig (#32555)
  [TEST} unmutes SearchAsyncActionTests and adds debugging info
  [ML] Add Detector config classes to protocol library (#32495)
  [Rollup] Remove builders from MetricConfig (#32536)
  Tests: Add rolling upgrade tests for watcher (#32428)
  Fix race between replica reset and primary promotion (#32442)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >refactoring v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants