Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement matches() on SourceConfirmedTextQuery #100134

Merged
merged 6 commits into from
Oct 4, 2023

Conversation

romseygeek
Copy link
Contributor

match_only_text does not currently support highlighting via the matches
option of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.

@elasticsearchmachine
Copy link
Collaborator

Hi @romseygeek, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Oct 2, 2023
@romseygeek
Copy link
Contributor Author

also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.

I'm not sure if this is a released bug or not - if it is, then I can pull out the fix into a separate PR for backport.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix, I left a few minor comments and a question for my understanding. LGTM otherwise.

import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder;
import static org.hamcrest.Matchers.containsString;

public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is on me since you seem to have gotten this directly from #100066, but "IntegrationIT" is "doppelt gemoppelt" as we would say in German, translates roughly to "repeated unnecessarily" since IT already means "Intergration Test". Sorry for that, I'm adding a suggestion to change.

Suggested change
public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase {
public class MatchOnlyTextMapperIT extends ESIntegTestCase {

mappings.endObject();
assertAcked(prepareCreate("test").setMapping(mappings));
BulkRequestBuilder bulk = client().prepareBulk("test").setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
for (int i = 0; i < 2000; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to add a comment for our future selves why we are indexing this large number of docs here and why this is necessary to catch this bug.

.get();
assertNoFailures(searchResponse);
assertThat(
searchResponse.getHits().getAt(0).getHighlightFields().get("message").fragments()[0].string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check all hits in a loop that they contain correct highlighting? Even though it would always be the same?

return context -> {
ValueFetcher valueFetcher = valueFetcher(searchExecutionContext, null);
SourceProvider sourceProvider = searchExecutionContext.lookup();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: this should also take care of the concurrency issue from #100074 since the the searchExecutionContext.lookup() call is now done per leaf? Is that the correct reading of this?
In any case, can you also see if the StoredFieldLoader from L198 should also be moved into the context-lambda in the case where we have synthetic source enabled? I'm only guessing here so maybe it's not needed, but I also guess we don't have tests covering that code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. We don't need to do the same with StoredFieldLoader as that is meant to be global and then create separate LeafStoredFieldLoaders per-segment. The issue here is that we have a top-level SourceProvider which is caching segment information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a test specifically for the synthetic source path as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite tricky to read the difference here, especiall as valueFetcher does not take any different argument than before :)

@cbuescher
Copy link
Member

cbuescher commented Oct 2, 2023

I'm not sure if this is a released bug or not

I believe we first saw this on an 8.10.2 index, so that would mean it's already released on that line.

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@romseygeek romseygeek requested a review from jimczi October 3, 2023 09:38
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM, I left some comments on the testing that we can address in a follow up.

@@ -293,6 +296,22 @@ public RuntimePhraseScorer scorer(LeafReaderContext context) throws IOException
return new RuntimePhraseScorer(this, approximation, leafSimScorer, valueFetcher, field, in);
}

@Override
public Matches matches(LeafReaderContext context, int doc) throws IOException {
FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to explain why we're doing this? It's a bit difficult to follow from outside the highlighter code.


// We index and retrieve a large number of documents to ensure that we go over multiple
// segments, to ensure that the highlighter is using the correct segment lookups to
// load the source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need 2k docs to simulate multi segments? Realistically we should have caught this bug earlier by deactivating the weight matches when SourceConfirmedTextQuery is involved. Maybe we need a specific test in the field mappers that ensure that highlighting is always tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #100249 - I'm not sure exactly why we need the 2k docs but it seems to trigger the issue more consistently than a smaller example dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating this reproduction I was aiming at getting a failure around 2-3 out of ten local runs. It wasn't clear to me completely which scenario was most likely to trigger the bug, now that we know what caused it maybe its possible to reduce this number and e.g. introduce flushes etc... to eg. increase likelihood of having more segments if that was the thing causing it.

@romseygeek romseygeek merged commit bb5ed98 into elastic:main Oct 4, 2023
@romseygeek romseygeek deleted the highlight/match-only-text-bug branch October 4, 2023 09:03
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Oct 4, 2023
`match_only_text` does not currently support highlighting via the matches
option of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Highlighting How a query matched a document Team:Search Meta label for search team v8.10.3 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants