-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test #444
LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test #444
Conversation
… calculation, and added a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good catch @zacharymorn
: "There is duplicated field [" | ||
+ field.field | ||
+ "] used to construct MultiNormsLeafSimScorer"; | ||
duplicateCheckingSet.add(field.field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be assert duplicateCheckingSet.add(field.field) == false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. I assume you meant assert duplicateCheckingSet.add(field.field)
and have updated it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
@@ -165,6 +169,117 @@ public void testSameScore() throws IOException { | |||
dir.close(); | |||
} | |||
|
|||
public void testSameScoreAndCollectionBetweenCompleteAndTopScores() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how this test relates to the fix? Would it make sense to have a more targeted test similar to testCopyField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was actually developed in another related PR #418, and it would generate duplicated field-weight pairs in fields
object to trigger the condition. As the existing tests don't currently trigger that condition, I thought this would be a good test to put in first for both PRs.
I guess if we were to have a focused test for this PR, it would still look something very similar to this test (and testCopyField
), maybe something like the following, and it just ensures the test run through without assertion exception. The delta between the tests are just I removed some doc content randomizations and result comparison between TOP_SCORE and COMPLETE collection.
public void testShouldRunWithoutAssertionException() throws IOException {
int numDocs =
randomBoolean()
? atLeast(1000)
: atLeast(128 * 8 * 8 * 3); // make sure some terms have skip data
int numMatchDoc = randomIntBetween(200, 500);
int numHits = atMost(100);
int boost1 = Math.max(1, random().nextInt(5));
int boost2 = Math.max(1, random().nextInt(5));
Directory dir = newDirectory();
Similarity similarity = randomCompatibleSimilarity();
IndexWriterConfig iwc = new IndexWriterConfig();
iwc.setSimilarity(similarity);
RandomIndexWriter w = new RandomIndexWriter(random(), dir, iwc);
// adding potentially matching doc
for (int i = 0; i < numMatchDoc; i++) {
Document doc = new Document();
int freqA = random().nextInt(20) + 1;
if (randomBoolean()) {
for (int j = 0; j < freqA; j++) {
doc.add(new TextField("a", "foo", Store.NO));
}
}
freqA = random().nextInt(20) + 1;
if (randomBoolean()) {
for (int j = 0; j < freqA; j++) {
doc.add(new TextField("a", "foo" + j, Store.NO));
}
}
freqA = random().nextInt(20) + 1;
if (randomBoolean()) {
for (int j = 0; j < freqA; j++) {
doc.add(new TextField("a", "zoo", Store.NO));
}
}
int freqB = random().nextInt(20) + 1;
if (randomBoolean()) {
for (int j = 0; j < freqB; j++) {
doc.add(new TextField("b", "zoo", Store.NO));
}
}
freqB = random().nextInt(20) + 1;
if (randomBoolean()) {
for (int j = 0; j < freqB; j++) {
doc.add(new TextField("b", "zoo" + j, Store.NO));
}
}
int freqC = random().nextInt(20) + 1;
for (int j = 0; j < freqC; j++) {
doc.add(new TextField("c", "bla" + j, Store.NO));
}
w.addDocument(doc);
}
IndexReader reader = w.getReader();
IndexSearcher searcher = newSearcher(reader);
searcher.setSimilarity(similarity);
CombinedFieldQuery query =
new CombinedFieldQuery.Builder()
.addField("a", (float) boost1)
.addField("b", (float) boost2)
.addTerm(new BytesRef("foo"))
.addTerm(new BytesRef("zoo"))
.build();
TopScoreDocCollector completeCollector =
TopScoreDocCollector.create(numHits, null, Integer.MAX_VALUE);
searcher.search(query, completeCollector);
reader.close();
w.close();
dir.close();
}
I guess I'm fine either way? If you prefer a more focused test, I can replace it with the above one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think starting with this more focused test makes sense for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I've replaced the test with a more focused one.
Thanks @jimczi @jtibshirani for the review and feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too!
Will backport this to 8.11 and 8x. |
Hi @jimczi @jtibshirani @mikemccand, just to confirm as I saw there was a thread on the proper handling of |
Correct, changes should no longer be backported to |
…calculation (apache#444) (cherry picked from commit 07ee3ba)
…calculation (apache#444) (cherry picked from commit 07ee3ba)
Description
Updated field-weight used in CombinedFieldQuery scoring calculation
Tests
./gradlew clean; ./gradlew check -Pvalidation.git.failOnModified=false
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.