Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Call Signature Per Script Context #20426

Closed
jdconrad opened this issue Sep 12, 2016 · 21 comments
Closed

Custom Call Signature Per Script Context #20426

jdconrad opened this issue Sep 12, 2016 · 21 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement

Comments

@jdconrad
Copy link
Contributor

jdconrad commented Sep 12, 2016

We should make each context execute scripts based on exactly what gets passed in and returned. This eliminates a huge number of potential boxes for something like scoring that returns a double value out of a script. This may also eliminate a large number of hash look ups as certain parameters will be passed directly into the method instead of accessed from a map. Over millions of documents these performance gains really add up. Another benefit will be prevention of accessing null variables that may not belong in a context. For this to work the context will have to be passed in at compile-time.

I need to run through all the script cases and figure out what the contexts should be as I'm not sure if the current ones cover all the situations.

For file based scripts, I would propose for Painless and Expressions that the script be named something like script.score.painless now to embed the context in the name. For something like Groovy and other languages we can have an "unknown" context so these remain backwards compatible. The "unknown" context would revert to the old call signatures. Once the older languages are fully removed then we can remove the "unknown" context.

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

I agree with the idea of having the context at compile time. Are you thinking you'd have to pass in a Class and we'd would analyze the publicly visible getters? Something more explicit and less magic? Like a Map<String, ContextField> where ContextField has Method references for getting and (optionally) setting?

@jdconrad
Copy link
Contributor Author

@nik9000 I can't say this is 100% thought-through yet as I need to go through the code and figure out what the contexts should be (I don't think the existing ones completely fit all the cases), but as I think you've seen in the past from me I prefer explicit over magic typically :) . I was thinking more along the lines of perhaps an interface per context with the signature that must be implemented by the script object that is returned from compile. CompiledScripts that implement the specific interface are what get's executed rather than a call back to the ScriptService. I envision a CompiledScript containing more than one interface possibly so we hopefully wouldn't end up with an excessive number of new classes.

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

That makes sense. I think it'd be pretty cool to say "here is some text that a user sent me, please compile it and return something that implements this interface". We wouldn't have to pass in there ctx objects any more at all which'd be cool.

I like the flexibility of letting the user of the script engine specify the interface that it needs. It lines up pretty well with "what can this script engine do?" because implementations like Mustache and Expressions can just choke when the caller asks them for an interface they don't understand but groovy and painless can actually implement the interface. Groovy through dynamic stuff and painless with static stuff.

@jdconrad
Copy link
Contributor Author

@nik9000 Exactly what I was thinking! As a bonus, I really think this will help clean up the ScriptService quite a bit too.

@nik9000
Copy link
Member

nik9000 commented Sep 12, 2016

++

@jdconrad
Copy link
Contributor Author

jdconrad commented Sep 12, 2016

So I began to modify the code to allow context at compilation for script engines that would like it, and ran into an issue with stored scripts.

In RestPutStoredScriptAction we add handlers like this:

    protected RestPutStoredScriptAction(Settings settings, RestController controller, boolean registerDefaultHandlers) {
        super(settings);
        if (registerDefaultHandlers) {
            controller.registerHandler(POST, "/_scripts/{lang}/{id}", this);
            controller.registerHandler(PUT, "/_scripts/{lang}/{id}", this);
        }
    }

I would like to optionally have a context as part of this API, so I would propose that we change the handlers to be /_scripts/{id} and take in JSON as we do for other API's. This would allow us to add an optional context parameter.

Definitely looking for some feedback on this issue.

@rjernst
Copy link
Member

rjernst commented Sep 12, 2016

Changing to just key on id in the stored scripts api sounds right to me. The cache is based on script id alone right?

@jdconrad
Copy link
Contributor Author

The cache is based on both id and lang. And with context added, it would be based on all three.

@s1monw
Copy link
Contributor

s1monw commented Sep 13, 2016

I would like to optionally have a context as part of this API, so I would propose that we change the handlers to be /_scripts/{id} and take in JSON as we do for other API's. This would allow us to add an optional context parameter.

you can still add an optional parameter here even with keeping this URL?

 controller.registerHandler(POST, "/_scripts/{lang}/{id}", this);
 controller.registerHandler(PUT, "/_scripts/{lang}/{id}", this);
 controller.registerHandler(POST, "/_scripts/{lang}/{id}/{context}", this);
 controller.registerHandler(PUT, "/_scripts/{lang}/{id}/{context}", this);

I am not sure if that is sufficient but I thought I'd bring it up.

++ on this issue in order to get good performance from scripts we have to require this and I really wonder if we should make it non-optional and require it for all scripts. Existing ones can still use Unknown or All? We have the infra to do this added when we required the lang to be present on all scripts. WDYT?

@clintongormley
Copy link
Contributor

I love the idea but i would prefer it if it were implicit (ie not user specified). Also, I think that specifying eg aggs is insufficient as some aggs scripts return doubles and some return strings/objects.

How about this:

We have a number of contexts defined internally, eg returns_double, whatever. When we execute (eg) a file script, we pass in the context plus the script file name to the script service. The service tries to retrieve a compiled script that is keyed off the file name PLUS the context. If it doesn't exist, it retrieves the script from the file, compiles it for the context, and caches it with the context.

So if the same script is used with different contexts, then it will be cached multiple times. That said, most scripts will only be cached once or maybe twice.

Our contexts can be as specific as we need and this is totally transparent to the user.

@nik9000
Copy link
Member

nik9000 commented Sep 13, 2016

I suppose the nice thing about knowing the context up front is that you can
compile the script in the context right away and get useful error messages.

I, too, had envisioned this be done when the script is first used and not
at storage time, but I'd jack likes it at storage time that is fine with me
so long as we can work it out. I wonder if we can do this at first usage
time in one PR so we and then look at the separate storage in a second PR?

On Sep 13, 2016 7:01 AM, "Clinton Gormley" notifications@github.com wrote:

I love the idea but i would prefer it if it were implicit (ie not user
specified). Also, I think that specifying eg aggs is insufficient as some
aggs scripts return doubles and some return strings/objects.

How about this:

We have a number of contexts defined internally, eg returns_double,
whatever. When we execute (eg) a file script, we pass in the context plus
the script file name to the script service. The service tries to retrieve a
compiled script that is keyed off the file name PLUS the context. If it
doesn't exist, it retrieves the script from the file, compiles it for the
context, and caches it with the context.

So if the same script is used with different contexts, then it will be
cached multiple times. That said, most scripts will only be cached once or
maybe twice.

Our contexts can be as specific as we need and this is totally transparent
to the user.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#20426 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANLohtYrmaoHlQl5EnFiiijXQZUoJHOks5qpoJygaJpZM4J6wjq
.

@s1monw
Copy link
Contributor

s1monw commented Sep 13, 2016

I mean that magic applied here is really nice until it doesn't work. Then all kinds of ugly stuff happens. What if a script can't compile for a certain context? Do we reject it when it's added? I think we should? Do we require the contexts that it's valid for in such a case?
I really wonder if we are trying to be too user friendly on the wrong end here?

@rjernst
Copy link
Member

rjernst commented Sep 13, 2016

I love the idea but i would prefer it if it were implicit (ie not user specified).

Implicit here is not actually user friendly: it means the user can write a broken script that uses something not available, but not find out until runtime. For file scripts, this is inevitable, since they are loaded completely asynchronously. However, for stored scripts, I think this is important feedback to have as soon as possible, rather than when later (possibly a different) user hits an error at runtime for using something not available.

Also, I think that specifying eg aggs is insufficient as some aggs scripts return doubles and some return strings/objects.

This is why we need more contexts than what we have now. There are actually many more uses of scripts than the existing contexts lead you to think. Scripted metrics is even more complicated, it should have multiple contexts, one for each phase.

@jdconrad
Copy link
Contributor Author

The contexts that are there would only be the starting point, but once we make the changes to handle custom signatures with the existing contexts, it should be easier to add more. I think that's the time to go through all the uses of compile and then figure out which contexts make sense and add them. I envision us having an interface for each context's signature, and having a subclass of CompiledScript implement the interface(s) that will then be called directly by the CompiledScript requester. There is no need to have these calling back to the ScriptService when they already have a CompiledScript.

It would be great to enforce that all scripts be required to have context, but this would kill backcompat for Groovy, Expressions and Javascript right now. I think the thing to do about this is to make Painless require it for 5.0, then remove Groovy, Javascript, and have Expressions require it for 6.0.

I also don't think implicit scripts is a good idea (we do need some of this for backcompat unfortunately), but moving forward I would absolutely require the context be provided with all scripts. It will eliminate much frustration to the issues that Simon pointed out, and most scripts won't be able to be useful in multiple contexts anyway.

On the storage issue, we are not very well set up to compile with context on first use, since we only cache the already compiled scripts. Painless will be able to make good use of the context at compile, so I would prefer to have it with storage the way it is now, otherwise a lot more needs to suddenly be refactored.

@jdconrad
Copy link
Contributor Author

@s1monw I actually did consider adding more handlers, and I do believe that's the easiest way to do that, but @rjernst was thinking it might be better to only have the API cache on an ID (just the name) for a stored script and possibly file scripts too, and that would require a JSON blob to take in other pieces like lang and context.

@rjernst
Copy link
Member

rjernst commented Sep 13, 2016

Adding more handlers isn't good for a couple reasons:

  • It is confusing because really the id should be globally unique. Right now we have IMO a difficult to use api to reference stored/file scripts in that you must also specify the language. We should be able to lookup a stored script simply by its id. Changing the handlers to /_scripts/{id} will make it clear all scripts share the same namespace, regardless of lang or context.
  • Accepting the same json that the eg search api accepts normally allows for changes there to be reflected automatically in stored scripts. For example, if we add an options map with compile time options, there would be a place for that in stored scripts. Right now, there is no way to add such options.

@apatrida
Copy link
Contributor

apatrida commented Mar 28, 2017

This will benefit the new secure Kotlin script engine for Elasticsearch as well. It can execute both text based scripts where we can compile the script to a different base class that handles the context interface, and also can ship client-side lambdas as binary inline scripts that are extensions to a receiver class which represents the context interface or the Lambda as a SAM conversion depending on how a context's interface will be represented. Then type inference, code completion and other IDE features can kick in to help in script development.

Currently everything compiles against one class that unifies the contexts and helps a bit with extensions trying to make things a bit more uniform, but it is still confusing and has lower performance than if each call/context was uniquely suited to the script's real job. And as the comments about Painless above mention, we can provide more useful errors at compilation since we know what can/should be accessed.

Is there any branch with changes underway for this? I'd like to track it if so.

@nik9000
Copy link
Member

nik9000 commented Mar 28, 2017

Is there any branch with changes underway for this? I'd like to track it if so.

Given how we work long running branches are a huge pain to maintain. No one is actively working on this at the moment. I was a few days ago, but that turned out to be a dead end: #23744

@jdconrad
Copy link
Contributor Author

jdconrad commented May 10, 2017

Been a bit, but this is progressing again. Some notes on the current discussions:

  • Allow users to create custom contexts that specify the name of the context, the signature for the method to call against a generated script, and a whitelist of classes, methods, and fields that are to be allowed be used in the script (may not matter for some languages).
  • Allow users to create custom languages that will have to decide how and which contexts to support. Custom languages do not necessarily need to be able to support all contexts. Expressions will only support search, scripted fields, and possibly aggs.
  • Possibly remove stored templates and only use the stored scripting API. (Remove Stored Template API in favor of Stored Scripting API #24596). With custom contexts, a template context can be made that accepts a Map<String, Object> and returns a String. With this in place, there is no reason to only allow Mustache for templates. Any language that supports this context should be allowed.
  • Settings will be allowed to turn enable and disable any context.

@rjernst
Copy link
Member

rjernst commented May 30, 2017

This work has been slowly progressing in #24868, #24818, #24817, #24873, #24877, #24883, #24896 and #24897.

rjernst added a commit to rjernst/elasticsearch that referenced this issue May 30, 2017
…n script contexts

ScriptContexts currently understand a FactoryType that can produce
instances of the script InstanceType. However, for search scripts, this
does not work as we have the concept of LeafSearchScript that is created
per lucene segment. This commit effectively renames the existing
SearchScript class into SearchScript.LeafFactory, which is a new,
optional, class that can be defined within a ScriptContext.
LeafSearchScript is effectively renamed back into SearchScript. This
change allows the model of stateless factory -> stateful factory ->
script instance to continue, but in a generic way that any script
context may take advantage of.

relates elastic#20426
rjernst added a commit that referenced this issue May 30, 2017
…n script contexts (#24974)

ScriptContexts currently understand a FactoryType that can produce
instances of the script InstanceType. However, for search scripts, this
does not work as we have the concept of LeafSearchScript that is created
per lucene segment. This commit effectively renames the existing
SearchScript class into SearchScript.LeafFactory, which is a new,
optional, class that can be defined within a ScriptContext.
LeafSearchScript is effectively renamed back into SearchScript. This
change allows the model of stateless factory -> stateful factory ->
script instance to continue, but in a generic way that any script
context may take advantage of.

relates #20426
rjernst added a commit to rjernst/elasticsearch that referenced this issue Jun 2, 2017
This commit creates TemplateScript and associated classes so that
templates no longer need a special ScriptService.compileTemplate method.
The execute() method is equivalent to the old run() method.

relates elastic#20426
rjernst added a commit that referenced this issue Jun 2, 2017
This commit creates TemplateScript and associated classes so that
templates no longer need a special ScriptService.compileTemplate method.
The execute() method is equivalent to the old run() method.

relates #20426
@jdconrad
Copy link
Contributor Author

This is done. Closing.

@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement
Projects
None yet
Development

No branches or pull requests

7 participants