-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script engine for Lucene expressions #6819
Conversation
(MapperService causing errors).
…iven field name).
scorer = new CannedScorer(); | ||
} | ||
|
||
double evaluate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this expensive per-document stuff? It sorta totally defeats the point of having expressions!
Can these hacks be moved to setScorer or setNextReader? why isn't setScorer being called, can you expand on the comment? that seems broken...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a goof. I did this just to get it working, and then forgot to fix it! Of course tests pass, but I'm still working on benchmarking so I haven't seen the slowness yet.
I'll expand the comment in setScorer, and will come up with a hack to allow setNextVar to work through a single FunctionValues. It will still be a little slower than normal expressions for the aggregations case because it will do a hash of the variable name.
Hi ryan, I left an initial comment. Is there anything we can do to remove the expensive per-document stuff? I realize these might be limitations of the scripting API, but maybe we can fix that? Otherwise, if we are doing things like hashing strings etc per-document to resolve variable names, its going to be very slow. |
@@ -150,6 +150,12 @@ | |||
<scope>compile</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.apache.lucene</groupId> | |||
<artifactId>lucene-expressions</artifactId> | |||
<version>${lucene.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should mark it optional due to all the dependencies it has maybe @kimchy knows what to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only has two: antlr-runtime and asm. I don't think its that many :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @rmuir. If we make it optional, we would have to do some magic to make the expression script engine load only if expressions are available (which I guess isn't that hard to do, but the other built in script engines don't assume they are optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be optional, but we should make sure we package it as part of our distro. In our script service we already support optional mvel and groovy, so thats easy, similar logic.
Its less about the size, its about the fact that those are 2 very popular libraries, and almost certainly will cause clashes when someone wants to use a NodeClient / TransportClient
In order to include it in our distro, make sure to add it in the deb/rpm logic in the pom.xml (with all deps), and in src/main/assemblies/common-bin, then run the package and check tar.gz includes all the deps, as well as the deb and rpm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I made the dep optional. I didn't need to change anything else since the lib starts with "lucene" and lucene* is already included. I checked the tar.gz and it includes all 3 deps.
I left some initial comments.. I will remove the review label for now feel free to put it back once you have new stuff to review |
|
||
@Override | ||
public void setNextScore(float score) { scorer.score = score; } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add another TODO here? This seems massively inefficient:
- expressions know if they refer to score, true or false. in lucene, Scorer.score() will only be called, if the expression actually needs it.
- here it seems, something calls score() always, and passes it to the expression even if its maybe not needed? so it might be called uselessly on the scorer, even if the ranking function doesnt use it.
- passing score in this hacky way im sure prevents inlining of actual scoring function into the expression and hurts performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok added it.
…ess documentation).
…ve and groovy scripts
Ok, I have some benchmark numbers. The following tests were done with 1 million random docs, and 1k iterations of a
|
awesome numbers! expressions will be a very welcome addition to scripting |
@@ -873,7 +880,7 @@ | |||
</data> | |||
<data> | |||
<src>${project.build.directory}/lib</src> | |||
<includes>lucene*, log4j*, jna*, spatial4j*, jts*, groovy*</includes> | |||
<includes>lucene*, log4j*, jna*, spatial4j*, jts*, groovy*, </includes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this extra ,
needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how that got there. Removing.
I left some cosmetic comments. LGTM |
Ok, I think I addressed all your issues. I will fix setScorer in another PR. |
I implemented a partial fix for setScorer in #6864. |
LGTM I think we should get this into |
Maybe open another issue about this: |
@uschindler from 1.3, sandboxed languages (ie groovy and expressions) are enabled by default: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_enabling_dynamic_scripting |
Thanks @clintongormley ! |
Adds support for new "expression" script engine.
This is backed by lucene expressions. These are javascript expressions, which can only access numeric fielddata, parameters, and _score. They can only be used for searches (not document updates).
closes #6818