-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCRIPTING: Move Aggregation Scripts to their own context #32068
SCRIPTING: Move Aggregation Scripts to their own context #32068
Conversation
Pinging @elastic/es-core-infra |
@rjernst can you take a look here when you have some time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about taking this approach. Part of the goal of moving to script contexts is to solidify the api of what is accessible and returned by each script invocation.
|
||
public static final ScriptContext<Factory> CONTEXT = new ScriptContext<>("aggs_executable", Factory.class); | ||
|
||
public abstract Object execute(Map<String, Object> params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of adding script contexts was to avoid having generic object returns like this that must be cast. It is especially relevant in numeric processing, as boxing/unboxing is not free. I think we actually want aggregation script classes for each different aggregation use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public static final String[] PARAMETERS = { "params" }; | ||
|
||
public static final ScriptContext<Factory> CONTEXT = new ScriptContext<>("aggs_executable", Factory.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was called "executable" before only because of the legacy ExecutableScript that it used. We should have agg specific names for these scripts.
@rjernst alrighty split the context into 2 contexts for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start! I have some comments.
executableScript.setNextVar("_subset_size", subsetSizeHolder); | ||
executableScript.setNextVar("_superset_freq", supersetDfHolder); | ||
executableScript.setNextVar("_superset_size", supersetSizeHolder); | ||
params.putAll(script.getParams()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be its own context. Putting these into params would be a breaking change, and also not utilize the intent of having contexts (different variables for different uses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst but currently these are documented as params
aren't they? See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#_scripted
Also under the hood org.elasticsearch.painless.ScriptImpl
simply puts these under params
, these aren't available as top level params are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I was confused because of what I saw in ScriptImpl for painless (naming implying it was in variables for the script, but that is actually the params). I still think this needs to be its own context. We can eventually move these to direct arguments of the execute method (again, so params can be read-only in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can be added as direct arguments in a separate PR. There should also be deprecation messages along with that so we can remove inserting them into params.
/** | ||
* A script used in bucket aggregations that returns a {@code boolean} value. | ||
*/ | ||
public abstract class BucketAggregateToBooleanScript { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is a little different than we use for other script classes. I suggestion BucketAggregationSelectorScript.
/** | ||
* A script used in bucket aggregations that returns a {@code double} value. | ||
*/ | ||
public abstract class BucketAggregateToDoubleScript { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest BucketAggregationScript for the name, to more closely match the style of other script class names.
Map<String, Object> vars = new HashMap<>(); | ||
if (script.getParams() != null) { | ||
vars.putAll(script.getParams()); | ||
if ("expression".equals(script.getLang())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have any logic like this. Script uses should be agnostic to the script language implementation. Expressions need to handle returning a boolean value to satisfy this script's api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst but that's a breaking change for existing expression scripts isn't it? (forgive me if I'm misunderstanding the way those work and that was a stupid question :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressions only know how to return doubles. I'm talking about the ExpressionScriptEngine, which is what implements each concrete class for a ScriptContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst ah got it :) thanks => On it adjusting this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String varName = entry.getKey(); | ||
String bucketsPath = entry.getValue(); | ||
Double value = resolveBucketValue(originalAgg, bucket, bucketsPath, gapPolicy); | ||
vars.put(varName, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the bucket accessors inside params
is how we currently do this, but I don't think it is how we should do this long term. Params should be solely for user provided params (they should be read only, since they are parsed once and should be the same across all documents). Can you please open an issue to discuss with the search/aggs team how these buckets should be accessible in the future? With contexts we could potentially not use buckets_path
at all, and have a buckets
variable available in the script. eg:
Current:
"bucket_selector": {
"buckets_path": {
"foobar": "foo>bar"
},
"script": "params.foobar > 200"
}
Potential new script use:
"bucket_selector": {
"script": "buckets['foo']['bar'].value > 200"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #32189 :)
@rjernst thanks! All points addressed I think. Did the suggested renamings and commented on the other things. |
@rjernst can you take another look here at the 2 open points? Thanks! |
@rjernst this should be ready for another round of reviewing, can you take another look? :) |
@rjernst ping if you have a sec :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last important thing about expressions.
executableScript.setNextVar("_subset_size", subsetSizeHolder); | ||
executableScript.setNextVar("_superset_freq", supersetDfHolder); | ||
executableScript.setNextVar("_superset_size", supersetSizeHolder); | ||
params.putAll(script.getParams()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can be added as direct arguments in a separate PR. There should also be deprecation messages along with that so we can remove inserting them into params.
return new BucketAggregationScript() { | ||
@Override | ||
public double execute(Map<String, Object> params) { | ||
params.forEach((name, value) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In expressions, we want these types of decisions made up front, not at execution time. Take a look at how params are passed to the factory in other script examples, and exposed as getParams()
on the script class. We want to avoid costly things like hash lookups per document for something that won't change (eg the params that exist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst but these params
aren't really params
like in the other scripts. They aren't constant but change in a loop here https://github.com/elastic/elasticsearch/pull/32068/files#diff-15179b62bf7bf2f829791af279ea380aR90.
Previously those were part of a params field passed to the factory which made the script stateful and I guess led to this todo https://github.com/elastic/elasticsearch/pull/32068/files#diff-15179b62bf7bf2f829791af279ea380aL98 <= that I resolved here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that isn't really resolving that comment. While only one script object is created with your change, the performance is the same because we are still doing the binding at execution time for each document. I would rather keep that original code with that comment, deprecate passing things in via params (by adding the parameters directly as arguments), and then fix the TODO in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst ah now I see. It still looks to me like the change here get's us to a better spot for existing scripts that still use the params
here since:
- We're saving the whole setting up of the array and hash map for function values in
org.elasticsearch.script.expression.ExpressionExecutableScript#ExpressionExecutableScript
- We get rid of the state in the script and the instantiating the script multiple times
Wouldn't it make more sense to keep this to improve the backwards compatible case and add a todo to avoid the redundant binding for the unchanged vars (and creating the map over and over) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or put differently :)
If we don't move to passing the params
as the execute
argument, then we'll be forced to recreate the script over so long as we support passing variables to scripts via params
won't we? => better move to more efficiently updating this map (and avoiding looping over it in expressions ideally) than adding the params
field back and being forced to recreate the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again, the correct solution will have params
in this still -> will revert to using params
as on the script tomorrow :)
@rjernst this should be good for another review :) Sorry for the stupid back and forth here last night! Restored the behaviour of the params here and kept the todo as requested now. |
@rjernst ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One last suggestion.
/** | ||
* A script used in script heuristics. | ||
*/ | ||
public abstract class ScriptHeuristicScript { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is confusing having "script" twice. Can we call it HeuristicScoreScript
? Or better yet, something specific to the agg it is used in (significant terms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignificantTermsHeuristicScoreScript
or too long? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's long, but probably better in this case too not have any possible confusion with regular scoring scripts.
@rjernst thanks! Renamed -> will merge once green :) |
* SCRIPTING: Move Aggregation Scripts to their own context
…pe-detection-with-leading-whitespace * elastic/master: (34 commits) Cross-cluster search: preserve cluster alias in shard failures (elastic#32608) Handle AlreadyClosedException when bumping primary term [TEST] Allow to run in FIPS JVM (elastic#32607) [Test] Add ckb to the list of unsupported languages (elastic#32611) SCRIPTING: Move Aggregation Scripts to their own context (elastic#32068) Painless: Use LocalMethod Map For Lookup at Runtime (elastic#32599) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (elastic#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (elastic#32613) [Rollup] Update wire version check after backport Suppress Wildfly test in FIPS JVMs (elastic#32543) [Rollup] Improve ID scheme for rollup documents (elastic#32558) ingest: doc: move Dot Expander Processor doc to correct position (elastic#31743) [ML] Add some ML config classes to protocol library (elastic#32502) [TEST]Split transport verification mode none tests (elastic#32488) Core: Move helper date formatters over to java time (elastic#32504) [Rollup] Remove builders from DateHistogramGroupConfig (elastic#32555) [TEST} unmutes SearchAsyncActionTests and adds debugging info [ML] Add Detector config classes to protocol library (elastic#32495) [Rollup] Remove builders from MetricConfig (elastic#32536) ...
* 6.x: [Kerberos] Use canonical host name (#32588) Cross-cluster search: preserve cluster alias in shard failures (#32608) [TEST] Allow to run in FIPS JVM (#32607) Handle AlreadyClosedException when bumping primary term [Test] Add ckb to the list of unsupported languages (#32611) SCRIPTING: Move Aggregation Scripts to their own context (#32068) (#32629) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613) [Rollup] Improve ID scheme for rollup documents (#32558) Mutes failing SQL string function tests due to #32589 Suppress Wildfly test in FIPS JVMs (#32543) Add cluster UUID to Cluster Stats API response (#32206) [ML] Add some ML config classes to protocol library (#32502) [TEST]Split transport verification mode none tests (#32488) [Rollup] Remove builders from DateHistogramGroupConfig (#32555) [ML] Add Detector config classes to protocol library (#32495) [Rollup] Remove builders from MetricConfig (#32536) Fix race between replica reset and primary promotion (#32442) HLRC: Move commercial clients from XPackClient (#32596) Security: move User to protocol project (#32367) Minor fix for javadoc (applicable for java 11). (#32573) Painless: Move Some Lookup Logic to PainlessLookup (#32565) Core: Minor size reduction for AbstractComponent (#32509) INGEST: Enable default pipelines (#32286) (#32591) TEST: Avoid merges in testSeqNoAndCheckpoints [Rollup] Remove builders from HistoGroupConfig (#32533) fixed elements in array of produced terms (#32519) Mutes ReindexFailureTests.searchFailure dues to #28053 Mutes LicensingDocumentationIT due to #32580 Remove the SATA controller from OpenSUSE box [ML] Rename JobProvider to JobResultsProvider (#32551)
* master: Cross-cluster search: preserve cluster alias in shard failures (#32608) Handle AlreadyClosedException when bumping primary term [TEST] Allow to run in FIPS JVM (#32607) [Test] Add ckb to the list of unsupported languages (#32611) SCRIPTING: Move Aggregation Scripts to their own context (#32068) Painless: Use LocalMethod Map For Lookup at Runtime (#32599) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613) [Rollup] Update wire version check after backport Suppress Wildfly test in FIPS JVMs (#32543) [Rollup] Improve ID scheme for rollup documents (#32558) ingest: doc: move Dot Expander Processor doc to correct position (#31743) [ML] Add some ML config classes to protocol library (#32502) [TEST]Split transport verification mode none tests (#32488) Core: Move helper date formatters over to java time (#32504) [Rollup] Remove builders from DateHistogramGroupConfig (#32555) [TEST} unmutes SearchAsyncActionTests and adds debugging info [ML] Add Detector config classes to protocol library (#32495) [Rollup] Remove builders from MetricConfig (#32536) Tests: Add rolling upgrade tests for watcher (#32428) Fix race between replica reset and primary promotion (#32442)
Same as #32003 in motivation, handling another step of the todo.
Also, handled another todo in
BucketSelectorPipelineAggregator
by movingparams
from being a field on the compiled script to being an input so that the new typeBucketAggregationScript
is stateless and can be reused in loops.