-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement matches() on SourceConfirmedTextQuery #100134
Implement matches() on SourceConfirmedTextQuery #100134
Conversation
Hi @romseygeek, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
I'm not sure if this is a released bug or not - if it is, then I can pull out the fix into a separate PR for backport. |
...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/SourceConfirmedTextQuery.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix, I left a few minor comments and a question for my understanding. LGTM otherwise.
import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder; | ||
import static org.hamcrest.Matchers.containsString; | ||
|
||
public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is on me since you seem to have gotten this directly from #100066, but "IntegrationIT" is "doppelt gemoppelt" as we would say in German, translates roughly to "repeated unnecessarily" since IT already means "Intergration Test". Sorry for that, I'm adding a suggestion to change.
public class MatchOnlyTextMapperIntegrationIT extends ESIntegTestCase { | |
public class MatchOnlyTextMapperIT extends ESIntegTestCase { |
mappings.endObject(); | ||
assertAcked(prepareCreate("test").setMapping(mappings)); | ||
BulkRequestBuilder bulk = client().prepareBulk("test").setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE); | ||
for (int i = 0; i < 2000; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to add a comment for our future selves why we are indexing this large number of docs here and why this is necessary to catch this bug.
.get(); | ||
assertNoFailures(searchResponse); | ||
assertThat( | ||
searchResponse.getHits().getAt(0).getHighlightFields().get("message").fragments()[0].string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check all hits in a loop that they contain correct highlighting? Even though it would always be the same?
return context -> { | ||
ValueFetcher valueFetcher = valueFetcher(searchExecutionContext, null); | ||
SourceProvider sourceProvider = searchExecutionContext.lookup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding: this should also take care of the concurrency issue from #100074 since the the searchExecutionContext.lookup()
call is now done per leaf? Is that the correct reading of this?
In any case, can you also see if the StoredFieldLoader from L198 should also be moved into the context-lambda in the case where we have synthetic source enabled? I'm only guessing here so maybe it's not needed, but I also guess we don't have tests covering that code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. We don't need to do the same with StoredFieldLoader as that is meant to be global and then create separate LeafStoredFieldLoaders per-segment. The issue here is that we have a top-level SourceProvider which is caching segment information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a test specifically for the synthetic source path as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite tricky to read the difference here, especiall as valueFetcher does not take any different argument than before :)
I believe we first saw this on an 8.10.2 index, so that would mean it's already released on that line. |
…g' into highlight/match-only-text-bug
@elasticmachine run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix LGTM, I left some comments on the testing that we can address in a follow up.
@@ -293,6 +296,22 @@ public RuntimePhraseScorer scorer(LeafReaderContext context) throws IOException | |||
return new RuntimePhraseScorer(this, approximation, leafSimScorer, valueFetcher, field, in); | |||
} | |||
|
|||
@Override | |||
public Matches matches(LeafReaderContext context, int doc) throws IOException { | |||
FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment to explain why we're doing this? It's a bit difficult to follow from outside the highlighter code.
|
||
// We index and retrieve a large number of documents to ensure that we go over multiple | ||
// segments, to ensure that the highlighter is using the correct segment lookups to | ||
// load the source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we need 2k docs to simulate multi segments? Realistically we should have caught this bug earlier by deactivating the weight matches when SourceConfirmedTextQuery
is involved. Maybe we need a specific test in the field mappers that ensure that highlighting is always tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #100249 - I'm not sure exactly why we need the 2k docs but it seems to trigger the issue more consistently than a smaller example dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating this reproduction I was aiming at getting a failure around 2-3 out of ten local runs. It wasn't clear to me completely which scenario was most likely to trigger the bug, now that we know what caused it maybe its possible to reduce this number and e.g. introduce flushes etc... to eg. increase likelihood of having more segments if that was the thing causing it.
`match_only_text` does not currently support highlighting via the matches option of the default highlighter. This commit implements matches on the backing query for this field, and also fixes a bug where the field type's value fetcher could hold on to the wrong reference for a source lookup, causing threading errors.
match_only_text
does not currently support highlighting via the matchesoption of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.