-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Account for Neg/Pos Offsets when building per field roaring bitmap posting lists #2213
Conversation
src/m3ninx/index/segment/builder/multi_segments_field_postings_list_iter_test.go
Show resolved
Hide resolved
for iter.Next() { | ||
field, pl := iter.Current() | ||
plIter := pl.Iterator() | ||
for plIter.Next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also do the inverse and make sure there's none that aren't included? (which is what we saw frequently, non-included metrics?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. get a doc reader and read each one, then perhaps just check if it has the field whether the postings list contains the ID or whether it doesn't have it whether it doesn't contain it? (can use the postings lists Contains(id ID) bool
method)
} | ||
|
||
i.currFieldPostingsList.UnionMany(i.currFieldPostingsLists) | ||
i.currReaders = append(i.currReaders, reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add the reader right at the start in case you do a continue
? It seems like if it's the first one you do a direct union and miss adding the reader to the i.currReaders
?
return false | ||
} | ||
for _, reader := range i.currReaders { | ||
if err := reader.Close(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of closing the readers here, perhaps do it from the defer above where we clear out the slice? That way any created readers always gets closed. Seems like there's some early returns in the method?
Something like
defer func() {
for idx, reader := range i.currReaders {
if err := reader.Close(); err != nil {
i.err = err
}
i.currReaders[idx] = nil
}
i.currReaders = i.currReaders[:0]
}()
i.currFieldPostingsLists = append(i.currFieldPostingsLists, pl) | ||
i.currReaders = append(i.currReaders, reader) | ||
value := curr + fieldsKeyIter.segment.offset - negativeOffset | ||
_ = i.currFieldPostingsList.Insert(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should check the error of this insertion? (Also maybe we should add checking of this from the other code in the terms postings lists merging too? I think it also ignores error from insert)
Codecov Report
@@ Coverage Diff @@
## master #2213 +/- ##
=========================================
- Coverage 72.3% 60.3% -12.1%
=========================================
Files 1022 944 -78
Lines 88809 84697 -4112
=========================================
- Hits 64275 51110 -13165
- Misses 20240 29915 +9675
+ Partials 4294 3672 -622
Continue to review full report at Codecov.
|
doc := docIter.Current() | ||
pID := docIter.PostingsID() | ||
found := checkIfFieldExistsInDoc(field, doc) | ||
require.Equal(t, found, pl.Contains(pID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Need to account for negative/positive offsets when building per field postings lists from multiple segments.
Also adds a test to verify that per field postings lists built from multiple segments is correct.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: