Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script engine for Lucene expressions #6819

Closed
wants to merge 19 commits into from

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Jul 10, 2014

Adds support for new "expression" script engine.

This is backed by lucene expressions. These are javascript expressions, which can only access numeric fielddata, parameters, and _score. They can only be used for searches (not document updates).

closes #6818

scorer = new CannedScorer();
}

double evaluate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this expensive per-document stuff? It sorta totally defeats the point of having expressions!

Can these hacks be moved to setScorer or setNextReader? why isn't setScorer being called, can you expand on the comment? that seems broken...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a goof. I did this just to get it working, and then forgot to fix it! Of course tests pass, but I'm still working on benchmarking so I haven't seen the slowness yet.

I'll expand the comment in setScorer, and will come up with a hack to allow setNextVar to work through a single FunctionValues. It will still be a little slower than normal expressions for the aggregations case because it will do a hash of the variable name.

@rmuir
Copy link
Contributor

rmuir commented Jul 10, 2014

Hi ryan, I left an initial comment. Is there anything we can do to remove the expensive per-document stuff? I realize these might be limitations of the scripting API, but maybe we can fix that? Otherwise, if we are doing things like hashing strings etc per-document to resolve variable names, its going to be very slow.

@@ -150,6 +150,12 @@
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-expressions</artifactId>
<version>${lucene.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should mark it optional due to all the dependencies it has maybe @kimchy knows what to do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only has two: antlr-runtime and asm. I don't think its that many :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rmuir. If we make it optional, we would have to do some magic to make the expression script engine load only if expressions are available (which I guess isn't that hard to do, but the other built in script engines don't assume they are optional).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be optional, but we should make sure we package it as part of our distro. In our script service we already support optional mvel and groovy, so thats easy, similar logic.

Its less about the size, its about the fact that those are 2 very popular libraries, and almost certainly will cause clashes when someone wants to use a NodeClient / TransportClient

In order to include it in our distro, make sure to add it in the deb/rpm logic in the pom.xml (with all deps), and in src/main/assemblies/common-bin, then run the package and check tar.gz includes all the deps, as well as the deb and rpm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I made the dep optional. I didn't need to change anything else since the lib starts with "lucene" and lucene* is already included. I checked the tar.gz and it includes all 3 deps.

@s1monw
Copy link
Contributor

s1monw commented Jul 10, 2014

I left some initial comments.. I will remove the review label for now feel free to put it back once you have new stuff to review

@s1monw s1monw removed the review label Jul 10, 2014
@rjernst
Copy link
Member Author

rjernst commented Jul 10, 2014

@rmuir @s1monw I think I addressed all of your comments.

@rjernst rjernst added the review label Jul 10, 2014

@Override
public void setNextScore(float score) { scorer.score = score; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another TODO here? This seems massively inefficient:

  1. expressions know if they refer to score, true or false. in lucene, Scorer.score() will only be called, if the expression actually needs it.
  2. here it seems, something calls score() always, and passes it to the expression even if its maybe not needed? so it might be called uselessly on the scorer, even if the ranking function doesnt use it.
  3. passing score in this hacky way im sure prevents inlining of actual scoring function into the expression and hurts performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added it.

@rjernst
Copy link
Member Author

rjernst commented Jul 12, 2014

Ok, I have some benchmark numbers. The following tests were done with 1 million random docs, and 1k iterations of a match_all query using a script for sorting. The script is described in each column and each cell is the average time per request.

lang x x + y 1.2 * x / y sqrt(abs(z)) + ln(abs(x * y))
expression 5 ms 14 ms 14 ms 26 ms
native 14 ms 22 ms 23 ms 38 ms
groovy 37 ms 71 ms 74 ms 116 ms

@clintongormley
Copy link
Contributor

awesome numbers! expressions will be a very welcome addition to scripting

@s1monw s1monw added v1.4.0 and removed v1.3.0 labels Jul 14, 2014
@@ -873,7 +880,7 @@
</data>
<data>
<src>${project.build.directory}/lib</src>
<includes>lucene*, log4j*, jna*, spatial4j*, jts*, groovy*</includes>
<includes>lucene*, log4j*, jna*, spatial4j*, jts*, groovy*, </includes>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this extra , needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how that got there. Removing.

@s1monw
Copy link
Contributor

s1monw commented Jul 14, 2014

I left some cosmetic comments. LGTM

@s1monw s1monw removed the review label Jul 14, 2014
@rjernst
Copy link
Member Author

rjernst commented Jul 14, 2014

Ok, I think I addressed all your issues. I will fix setScorer in another PR.

@rjernst
Copy link
Member Author

rjernst commented Jul 14, 2014

I implemented a partial fix for setScorer in #6864.

@s1monw
Copy link
Contributor

s1monw commented Jul 15, 2014

LGTM I think we should get this into 1.3 I will tag it as that so feel free to push it there

@s1monw s1monw added v1.3.0 and removed v1.4.0 labels Jul 15, 2014
@rjernst rjernst closed this Jul 15, 2014
@clintongormley clintongormley changed the title Scripting: Add script engine for lucene expressions Scripting: Add script engine for Lucene expressions Jul 16, 2014
@uschindler
Copy link
Contributor

Maybe open another issue about this:
The Expressions module is safe to use for anybody, because you cannot invoke any random java functions (just what is provided via the bindings and the Map of Double-Methods.
Currently you can only disable scripting completely in ES (if you want to make search APIs public to internet via reverse proxy). It would be cool to make the script engines that are available via REST configureable.
Should I open an issue or is this already solved here (I have not yet looked closely into this patch).

@clintongormley
Copy link
Contributor

@uschindler from 1.3, sandboxed languages (ie groovy and expressions) are enabled by default: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_enabling_dynamic_scripting

@uschindler
Copy link
Contributor

Thanks @clintongormley !

@rjernst rjernst deleted the feature/expressions branch January 21, 2015 23:22
@clintongormley clintongormley added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Jun 6, 2015
@clintongormley clintongormley changed the title Scripting: Add script engine for Lucene expressions Add script engine for Lucene expressions Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scripting: Add script engine for lucene expressions
7 participants