Implement matches() on SourceConfirmedTextQuery #100134

romseygeek · 2023-10-02T11:03:01Z

match_only_text does not currently support highlighting via the matches
option of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.

elasticsearchmachine · 2023-10-02T11:03:25Z

Hi @romseygeek, I've created a changelog YAML for you.

elasticsearchmachine · 2023-10-02T11:03:25Z

Pinging @elastic/es-search (Team:Search)

romseygeek · 2023-10-02T11:04:12Z

also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.

I'm not sure if this is a released bug or not - if it is, then I can pull out the fix into a separate PR for backport.

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/SourceConfirmedTextQuery.java

cbuescher

Thanks for the quick fix, I left a few minor comments and a question for my understanding. LGTM otherwise.

cbuescher · 2023-10-02T11:21:01Z

...nternalClusterTest/java/org/elasticsearch/index/mapper/MatchOnlyTextMapperIntegrationIT.java

+import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder;
+import static org.hamcrest.Matchers.containsString;
+
+public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase {


nit: this is on me since you seem to have gotten this directly from #100066, but "IntegrationIT" is "doppelt gemoppelt" as we would say in German, translates roughly to "repeated unnecessarily" since IT already means "Intergration Test". Sorry for that, I'm adding a suggestion to change.

Suggested change

public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase {

public class MatchOnlyTextMapperIT extends ESIntegTestCase {

cbuescher · 2023-10-02T11:24:14Z

...nternalClusterTest/java/org/elasticsearch/index/mapper/MatchOnlyTextMapperIntegrationIT.java

+        mappings.endObject();
+        assertAcked(prepareCreate("test").setMapping(mappings));
+        BulkRequestBuilder bulk = client().prepareBulk("test").setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
+        for (int i = 0; i < 2000; i++) {


Maybe it makes sense to add a comment for our future selves why we are indexing this large number of docs here and why this is necessary to catch this bug.

cbuescher · 2023-10-02T11:24:41Z

...nternalClusterTest/java/org/elasticsearch/index/mapper/MatchOnlyTextMapperIntegrationIT.java

+            .get();
+        assertNoFailures(searchResponse);
+        assertThat(
+            searchResponse.getHits().getAt(0).getHighlightFields().get("message").fragments()[0].string(),


Should we check all hits in a loop that they contain correct highlighting? Even though it would always be the same?

cbuescher · 2023-10-02T11:49:44Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java

            return context -> {
+                ValueFetcher valueFetcher = valueFetcher(searchExecutionContext, null);
+                SourceProvider sourceProvider = searchExecutionContext.lookup();


For my understanding: this should also take care of the concurrency issue from #100074 since the the searchExecutionContext.lookup() call is now done per leaf? Is that the correct reading of this?
In any case, can you also see if the StoredFieldLoader from L198 should also be moved into the context-lambda in the case where we have synthetic source enabled? I'm only guessing here so maybe it's not needed, but I also guess we don't have tests covering that code path.

Yes, that's correct. We don't need to do the same with StoredFieldLoader as that is meant to be global and then create separate LeafStoredFieldLoaders per-segment. The issue here is that we have a top-level SourceProvider which is caching segment information.

I'll add a test specifically for the synthetic source path as well.

It's quite tricky to read the difference here, especiall as valueFetcher does not take any different argument than before :)

cbuescher · 2023-10-02T11:55:19Z

I'm not sure if this is a released bug or not

I believe we first saw this on an 8.10.2 index, so that would mean it's already released on that line.

…g' into highlight/match-only-text-bug

romseygeek · 2023-10-02T15:04:15Z

@elasticmachine run elasticsearch-ci/part-2

jimczi

The fix LGTM, I left some comments on the testing that we can address in a follow up.

jimczi · 2023-10-03T15:28:42Z

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/SourceConfirmedTextQuery.java

@@ -293,6 +296,22 @@ public RuntimePhraseScorer scorer(LeafReaderContext context) throws IOException
                return new RuntimePhraseScorer(this, approximation, leafSimScorer, valueFetcher, field, in);
            }

+            @Override
+            public Matches matches(LeafReaderContext context, int doc) throws IOException {
+                FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field);


Maybe add a comment to explain why we're doing this? It's a bit difficult to follow from outside the highlighter code.

jimczi · 2023-10-03T16:30:20Z

...xtras/src/internalClusterTest/java/org/elasticsearch/index/mapper/MatchOnlyTextMapperIT.java

+
+        // We index and retrieve a large number of documents to ensure that we go over multiple
+        // segments, to ensure that the highlighter is using the correct segment lookups to
+        // load the source.


Out of curiosity, why do we need 2k docs to simulate multi segments? Realistically we should have caught this bug earlier by deactivating the weight matches when SourceConfirmedTextQuery is involved. Maybe we need a specific test in the field mappers that ensure that highlighting is always tested?

I opened #100249 - I'm not sure exactly why we need the 2k docs but it seems to trigger the issue more consistently than a smaller example dataset.

When creating this reproduction I was aiming at getting a failure around 2-3 out of ten local runs. It wasn't clear to me completely which scenario was most likely to trigger the bug, now that we know what caused it maybe its possible to reduce this number and e.g. introduce flushes etc... to eg. increase likelihood of having more segments if that was the thing causing it.

…text-bug

`match_only_text` does not currently support highlighting via the matches option of the default highlighter. This commit implements matches on the backing query for this field, and also fixes a bug where the field type's value fetcher could hold on to the wrong reference for a source lookup, causing threading errors.

Implement matches() on SourceConfirmedTextQuery

fa83b97

romseygeek added >enhancement :Search Relevance/Highlighting How a query matched a document v8.11.0 labels Oct 2, 2023

romseygeek requested review from cbuescher and jimczi October 2, 2023 11:03

romseygeek self-assigned this Oct 2, 2023

elasticsearchmachine added the Team:Search Meta label for search team label Oct 2, 2023

Update docs/changelog/100134.yaml

1175a96

jimczi reviewed Oct 2, 2023

View reviewed changes

...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/SourceConfirmedTextQuery.java Show resolved Hide resolved

cbuescher approved these changes Oct 2, 2023

View reviewed changes

romseygeek added 2 commits October 2, 2023 15:30

deef

bf4de1c

Merge remote-tracking branch 'romseygeek/highlight/match-only-text-bu…

2a30152

…g' into highlight/match-only-text-bug

romseygeek requested a review from jimczi October 3, 2023 09:38

jimczi approved these changes Oct 3, 2023

View reviewed changes

romseygeek added 2 commits October 4, 2023 08:56

Merge remote-tracking branch 'origin/main' into highlight/match-only-…

5fda2c1

…text-bug

Add comment

735f4c8

romseygeek merged commit bb5ed98 into elastic:main Oct 4, 2023

romseygeek deleted the highlight/match-only-text-bug branch October 4, 2023 09:03

cbuescher mentioned this pull request Oct 4, 2023

Implement matches() on SourceConfirmedTextQuery #100252

Merged

This was referenced Oct 4, 2023

Add test that exhibits errors on match_only_text highlighting #100066

Closed

Issue with using multiple threads for StoredFieldsReader #100074

Closed

romseygeek mentioned this pull request Oct 5, 2023

index_out_of_bounds_exception when highlighting "match_only_text" fields #100071

Closed

masseyke added the v8.10.3 label Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement matches() on SourceConfirmedTextQuery #100134

Implement matches() on SourceConfirmedTextQuery #100134

romseygeek commented Oct 2, 2023

elasticsearchmachine commented Oct 2, 2023

elasticsearchmachine commented Oct 2, 2023

romseygeek commented Oct 2, 2023

cbuescher left a comment

cbuescher Oct 2, 2023

cbuescher Oct 2, 2023

cbuescher Oct 2, 2023

cbuescher Oct 2, 2023

romseygeek Oct 2, 2023

romseygeek Oct 2, 2023

javanna Oct 2, 2023

cbuescher commented Oct 2, 2023 •

edited

Loading

romseygeek commented Oct 2, 2023

jimczi left a comment

jimczi Oct 3, 2023

jimczi Oct 3, 2023

romseygeek Oct 4, 2023

cbuescher Oct 4, 2023

	public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase {
	public class MatchOnlyTextMapperIT extends ESIntegTestCase {

Implement matches() on SourceConfirmedTextQuery #100134

Implement matches() on SourceConfirmedTextQuery #100134

Conversation

romseygeek commented Oct 2, 2023

elasticsearchmachine commented Oct 2, 2023

elasticsearchmachine commented Oct 2, 2023

romseygeek commented Oct 2, 2023

cbuescher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Oct 2, 2023 • edited Loading

romseygeek commented Oct 2, 2023

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Oct 2, 2023 •

edited

Loading