-
-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PhrasePrefixQuery #1842
add PhrasePrefixQuery #1842
Conversation
fn query_terms<'a>(&'a self, visitor: &mut dyn FnMut(&'a Term, bool)) { | ||
for (_, term) in &self.phrase_terms { | ||
visitor(term, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot the prefix (expanded prefix which is a suffix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not put it on purpose, we don't use that Term directly, only as a bound in a range (and at this instant, we can't expand it yet). RangeQuery
isn't giving its bounds when calling that function, so it seemed more coherent to not do that here either. If you think I still should emit it, please reply back.
for &(offset, ref term) in &self.phrase_terms { | ||
if let Some(postings) = reader | ||
.inverted_index(term.field())? | ||
.read_postings_no_deletes(term, IndexRecordOption::WithFreqsAndPositions)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you copied this from phrase_weight but the code is meaningless.. As far as I can tell read_postings_no_deletes does the same as read_postings.
Can you open an issue to cleanup this mess read_postings_no_deletes
?
Correctness is (probably) not at stake here... We do the check at the collector level anyway.
There is a non-trivial optimization decision to take here.
Do we do
check_phrase(intersection(remove_deletes(postings)))
or
remove_deletes(check_phrase(intersection(postings)))
The earlier probably seems trivially faster, but I suspect it depends on the filtering power of deletes.
Anyway for the moment we can probably stick to
remove_deletes(check_phrase(intersection(postings)))
for simplificity, and clean up the existing code.
while stream.advance() && (suffixes.len() as u32) < self.max_expansions { | ||
new_term.clear_with_type(new_term.typ()); | ||
new_term.append_bytes(stream.key()); | ||
if reader.has_deletes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. we don't need the distinction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but can you go throught the minor suggestions?
They are mostly about naming / lack of comments / one bug / and a WTF coming from the original phrase scorer.
37c8257
to
cee7272
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1842 +/- ##
==========================================
- Coverage 94.64% 94.50% -0.15%
==========================================
Files 301 305 +4
Lines 54924 55392 +468
==========================================
+ Hits 51985 52346 +361
- Misses 2939 3046 +107
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
see quickwit-oss/quickwit#2266