Some queries for runtime fields #58940

nik9000 · 2020-07-02T16:15:59Z

This implements a few queries for runtime fields. It doesn't wire them into the
runtime fields infrastructure yet because, well, we haven't committed that. But
the queries are there and they work. As do the doc values implementations.

You may wonder, Nik, why are you reimplementing all of these queries? Lucene
has tons of queries, some of them even against doc values. You make doc values,
why not just query the doc values you make with the Lucene queries? That'd be
a good question! It turns out that Lucene's doc_values queries are quite tied
up with Lucene's doc values implementations. They compare values using the
binary format that lucene stores the doc values in. Which is a great idea! But
it means that they only really work on Lucene's doc values implementations.

It also tries to only call each script once, even asserting that in most tests.
This isn't strictly required but I think it is useful and something that
would be difficult to design "after the fact". Still, this behavior shouldn't
block implementing the feature. We'll keep it if we can. But if we can't, well,
we'll drop and it and come back to it later.

relates to #59332

Put together some example script contexts for scripted fields. This is almost certainly wrong, but it gets us something to think about.

This reworks the script contexts to support multiple values for each hit. Just like before, these are almost certainly not the final implementation, but they give us something to iterate on.

Painless extensions work differently now.

Uses a temporary hack to allow us to use painless in unit tests.

…query

javanna · 2020-07-02T19:35:23Z

...untime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/AbstractRuntimeValues.java

+                            @Override
+                            public float matchCost() {
+                                // TODO we have no idea what this should be and no real way to get one
+                                return 1000f;


I think the cost can be for now the approximation.cost() , it is proportional to maxDocs if I remember correctly

javanna · 2020-07-06T10:22:20Z

server/src/main/java/org/elasticsearch/index/fielddata/SearchLookupAware.java

@@ -21,6 +21,8 @@

 import org.elasticsearch.search.lookup.SearchLookup;

+import java.io.IOException;
+


unused import

javanna · 2020-07-06T10:31:15Z

server/src/main/java/org/elasticsearch/search/lookup/LeafDocLookup.java

+                    if (ifd instanceof SearchLookupAware) {
+                        ((SearchLookupAware) ifd).setSearchLookup(searchLookup);
+                    }
+                    return ifd.load(reader).getScriptValues();


I did not foresee the need for these changes but I see how they are caused by me not changing fielddataBuilder and rather hacking query shard context. Just to double check: this is to support runtime fields that refer to other runtime fields, otherwise they have no search lookup set?

To keep this contained and have the hack in a single place, would it work to rather modify QueryShardContext#lookup to do the following? It may have weird consequences but in our branch with a big TODO to revert it it may be ok? Also let's mention in a comment specifically why this is needed?

public SearchLookup lookup() { if (lookup == null) { lookup = new SearchLookup(getMapperService(), this::getForField); } return lookup; }

Yeah, that is what my TODO was about - maybe this should go through getForField. I can look into making that change in master. I think it is good regardless. And, if it has consequences that we didn't think of we may as well see them earlier rather than later...

I don't know if this makes sense on master and what difference it would make, but I think it does make sense in our feature branch to isolate the hack around augmenting the fielddata impl.

javanna · 2020-07-06T10:50:17Z

...untime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/AbstractRuntimeValues.java

+
+                            @Override
+                            public float matchCost() {
+                                // TODO we don't have a good way of estimating the complexity of the script so we just go with 9000


I would remove the constant, what value does it add? I think that the important part is that a script needs to be run for each document, which is the cost approximation. Then one script can be heavier than another, but I wonder if that is negligible at this stage, unless we can calculate the approximate cost of a script.

Looking at the javadoc, I think it should actually just be some constant - matchCost says it is only for a single document.

If I just returned 1 here then I think Lucene'd reason about the query as though it were match_all, which is probably bad

...rc/main/java/org/elasticsearch/xpack/runtimefields/mapper/RuntimeKeywordMappedFieldType.java

javanna · 2020-07-06T11:17:32Z

.../runtime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/StringRuntimeValues.java

+            return DocValues::new;
+        }
+
+        private class DocValues extends SortedBinaryDocValues {


this is effectively the docvalues implementation that replaces the one that I had written which you deleted?

I was wondering, and not even sure if it matters, queries don't need sorted doc_values right? But we always implement the interface that we need for fielddata to simplify things? I know that I had implemented the lucene one for query but it made things more complex for almost no reason, I think.

this is effectively the docvalues implementation that replaces the one that I had written which you deleted?

Yeah. I had mine sitting around and it had unit tests and stuff. Its the same deal.

I was wondering, and not even sure if it matters, queries don't need sorted doc_values right? But we always implement the interface that we need for fielddata to simplify things? I know that I had implemented the lucene one for query but it made things more complex for almost no reason, I think.

In the patch as it stands now the queries don't use this and look at the array of values directly. And they don't sort them unless they build this thing. When I was writing the queries it felt simpler not to target the doc values implementation - partially because most queries don't care about the values being sorted. And, if they did care about the values being sorted, then they'd want a sorted array. Which is what we use to build the doc values implementation anyway.

javanna · 2020-07-09T09:51:22Z

server/src/main/java/org/elasticsearch/index/query/QueryShardContext.java

@@ -297,8 +297,7 @@ MappedFieldType failIfFieldMappingNotFound(String name, MappedFieldType fieldMap

    public SearchLookup lookup() {
        if (lookup == null) {
-            lookup = new SearchLookup(getMapperService(),
-                    mappedFieldType -> indexFieldDataService.apply(mappedFieldType, fullyQualifiedIndex.getName()));
+            lookup = new SearchLookup(getMapperService(), this::getForField);


add a TODO here, to remind us that we need to figure out if this should be merged upstream or not? once the searchlookupaware hack is removed and fielddatabuilder is adapted, we will not need this change anyways?

javanna · 2020-07-09T09:51:56Z

server/src/main/java/org/elasticsearch/search/lookup/LeafDocLookup.java

@@ -45,8 +45,7 @@

    private int docId = -1;

-    LeafDocLookup(MapperService mapperService, Function<MappedFieldType, IndexFieldData<?>> fieldDataLookup,
-                  LeafReaderContext reader) {
+    LeafDocLookup(MapperService mapperService, Function<MappedFieldType, IndexFieldData<?>> fieldDataLookup, LeafReaderContext reader) {


can you revert please? Trying to trim down changes to stuff that's outside of the runtime fields plugin

javanna · 2020-07-09T09:52:13Z

server/src/test/java/org/elasticsearch/search/lookup/LeafDocLookupTests.java

-        docLookup = new LeafDocLookup(mapperService,
-            ignored -> fieldData,
-            null);
+        docLookup = new LeafDocLookup(mapperService, ignored -> fieldData, null);


same here, let's revert?

javanna · 2020-07-09T09:53:39Z

...time-fields/src/main/java/org/elasticsearch/xpack/runtimefields/ForceNoBulkScoringQuery.java

+ * <p>
+ * Inspired by the ForceNoBulkScoringQuery in Lucene's monitor project.
+ */
+class ForceNoBulkScoringQuery extends Query {


we can remove this right?

javanna · 2020-07-10T12:26:29Z

...rc/main/java/org/elasticsearch/xpack/runtimefields/mapper/RuntimeKeywordMappedFieldType.java

@@ -27,7 +35,7 @@
    private final StringScriptFieldScript.Factory scriptFactory;

    RuntimeKeywordMappedFieldType(String name, Script script, StringScriptFieldScript.Factory scriptFactory, Map<String, String> meta) {
-        super(name, false, false, TextSearchInfo.NONE, meta);
+        super(name, false, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);


why SIMPLE_MATCH_ONLY instead of NONE? I think we end up creating a lucene field type here which should not be needed?

javanna · 2020-07-10T12:57:08Z

...ime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/DoubleRuntimeFieldHelper.java

+public final class DoubleRuntimeFieldHelper {
+    @FunctionalInterface
+    public interface NewLeafLoader {
+        IntConsumer leafLoader(LeafReaderContext ctx, DoubleConsumer sync) throws IOException;


some of my confusion is also around where the consumer is placed I think, which forces these indirections. Ideally, the values consumer would be provided anew for each document, either as an execute argument or while we setDocument? Is there some technical reason why it needs to provided with the leaf reader context? Ideally, we would be able to abstract this away and even have in the script base class the collection of values, so that every script allows to simply retrieve the result by calling getResult(docId) ? Do you think that would be possible?

javanna · 2020-07-10T12:59:24Z

...ime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/DoubleRuntimeFieldHelper.java

+        return new Values().new TermQuery(fieldName, value);
+    }
+
+    public Query termsQuery(String fieldName, double... value) {


I struggle to see value in these helper classes, but maybe this is due to the confusion I mentioned in other comments. I would have expected these methods implemented in the mapped field type, creating the queries there directly, and have top-level classes like ScriptTermQuery etc. maybe that is because I am hoping that the collection of values can be abstracted and exposed as part of the script object itself or some wrapper around it.

If wonder if this Values and abstractions are some reminiscence of the previous implementation that was trying to cache values.

javanna · 2020-07-10T13:03:56Z

...ime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/StringRuntimeFieldHelper.java

+
+        @Override
+        protected void sort() {
+            Arrays.sort(values, 0, count);


sorting is always done in the same way regardless of the data type right? I mean calling Arrays.sort

javanna · 2020-07-10T13:05:07Z

...ime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/StringRuntimeFieldHelper.java

+    }
+
+    private class Values extends AbstractRuntimeValues {
+        private String[] values = new String[1];


should we just have the collection of values shared between the different types and use a typed ArrayList? Would that be a tragedy in terms of performance?

javanna · 2020-07-10T13:08:06Z

...ime-fields/src/main/java/org/elasticsearch/xpack/runtimefields/StringRuntimeFieldHelper.java

+            return DocValues::new;
+        }
+
+        private class DocValues extends SortedBinaryDocValues {


like I said in other comments, I think that you have all the bits that are needed, I only would turn around a bit the dependencies/flow. I would keep the existing doc_values impl, which is only used in aggs/sorting (and it is totally fine that it's not used in queries), and I would make it lighter so that all it does is really returning the result of the script for the current doc, by calling that script.getResult(docId) method I mentioned above. I think that would clarify who does what. At the moment it is hard to follow who needs doc_values who needs them sorted etc.

elasticmachine · 2020-07-14T14:01:57Z

Pinging @elastic/es-search (:Search/Search)

javanna · 2020-07-14T14:02:52Z

shall we close this @nik9000 ? I think that it was a good experiment and we can now proceed getting in what we need step by step.

nik9000 · 2020-07-15T17:15:08Z

We got this in with #59630, #59527, and #59523.

javanna and others added 24 commits July 1, 2020 17:22

Add runtime fields plugin under x-pack

0cfdf33

Hack together some script contexts

e474582

Put together some example script contexts for scripted fields. This is almost certainly wrong, but it gets us something to think about.

Scripted fields: rework script contexts (elastic#58342)

76d8b94

This reworks the script contexts to support multiple values for each hit. Just like before, these are almost certainly not the final implementation, but they give us something to iterate on.

Spotless and maybe tests?

818403b

Fixup test case after merge

3669ec4

Painless extensions work differently now.

Get runtime fields tests passing

3a390b5

Uses a temporary hack to allow us to use painless in unit tests.

Add queries

59b2a18

Better way for queries

28944a4

Fails

7d43642

Shared iteration

89d6352

Long term query

5b353ce

Common stuffs?

23af466

Double queries

acf64c8

Rename

7ee42c0

Range query

216c02c

Fuzzy

09e6feb

Wildcard

4e66638

Regexp query

6837172

Exists and some sorting

f5106b2

Spotless

2e05521

WIP

bfe15f8

Fixup bool

7c18ddb

Remove some duplication

79d41c8

Merge branch 'feature/runtime_fields' into runtime_field_string_term_…

e3850a7

…query

nik9000 requested a review from javanna July 2, 2020 16:15

nik9000 added 2 commits July 2, 2020 12:17

Javadoc

ab6834e

Javadoc

4e83233

nik9000 marked this pull request as ready for review July 2, 2020 16:26

nik9000 added 2 commits July 2, 2020 14:46

Javadoc

810568b

Merge branch 'feature/runtime_fields' into runtime_field_string_term_…

3b79704

…query

javanna reviewed Jul 2, 2020

View reviewed changes

nik9000 added 4 commits July 2, 2020 16:25

Wire in doc values and term query

bc80a5e

Integrate!

93a031f

Its over nine thousand!

3409621

In a bool

b5b02ed

javanna reviewed Jul 6, 2020

View reviewed changes

itr

e17fef8

nik9000 force-pushed the runtime_field_string_term_query branch from be83b55 to e17fef8 Compare July 8, 2020 15:51

nik9000 added 4 commits July 8, 2020 11:52

Drop shared iteration tests

edc571f

Remove caching

f90a0c7

Rename

724aaba

plumb

6645c09

javanna reviewed Jul 10, 2020

View reviewed changes

javanna added the :Search/Search Search-related issues that do not fall into other categories label Jul 14, 2020

elasticmachine added the Team:Search Meta label for search team label Jul 14, 2020

nik9000 closed this Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some queries for runtime fields #58940

Some queries for runtime fields #58940

nik9000 commented Jul 2, 2020 •

edited by javanna

Loading

javanna Jul 2, 2020

javanna Jul 6, 2020

nik9000 Jul 6, 2020

javanna Jul 6, 2020

nik9000 Jul 6, 2020

javanna Jul 7, 2020

javanna Jul 6, 2020

nik9000 Jul 6, 2020

javanna Jul 6, 2020

nik9000 Jul 6, 2020

javanna Jul 9, 2020

javanna Jul 9, 2020

javanna Jul 9, 2020

javanna Jul 9, 2020

javanna Jul 10, 2020

javanna Jul 10, 2020

javanna Jul 10, 2020

javanna Jul 10, 2020

javanna Jul 10, 2020

javanna Jul 10, 2020

elasticmachine commented Jul 14, 2020

javanna commented Jul 14, 2020

nik9000 commented Jul 15, 2020

		@@ -21,6 +21,8 @@

		import org.elasticsearch.search.lookup.SearchLookup;

		import java.io.IOException;

Some queries for runtime fields #58940

Some queries for runtime fields #58940

Conversation

nik9000 commented Jul 2, 2020 • edited by javanna Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Jul 14, 2020

javanna commented Jul 14, 2020

nik9000 commented Jul 15, 2020

nik9000 commented Jul 2, 2020 •

edited by javanna

Loading