Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use binary search instead of linear for get_val in merge #1548

Merged
merged 2 commits into from
Sep 26, 2022
Merged

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Sep 23, 2022

No description provided.

@codecov-commenter
Copy link

Codecov Report

Merging #1548 (8b5aa2b) into main (20c8790) will not change coverage.
The diff coverage is 75.00%.

@@           Coverage Diff           @@
##             main    #1548   +/-   ##
=======================================
  Coverage   93.93%   93.93%           
=======================================
  Files         249      249           
  Lines       46066    46066           
=======================================
  Hits        43274    43274           
  Misses       2792     2792           
Impacted Files Coverage Δ
src/indexer/sorted_doc_id_multivalue_column.rs 97.80% <75.00%> (-2.20%) ⬇️
src/postings/stacker/expull.rs 98.57% <0.00%> (-0.48%) ⬇️
src/fastfield/multivalued/mod.rs 98.02% <0.00%> (+0.79%) ⬆️
bitpacker/src/bitpacker.rs 100.00% <0.00%> (+0.98%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

.binary_search_by(|&offset| offset.cmp(&pos))
.unwrap_or_else(|pos| pos - 1) as DocId; // Offsets start at 0, so -1 is safe

// There are duplicates in the data, but we want the one, where the next offset is not the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be possible to get this behaviour directly by using partition_point instead of binary_search_by?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like

let new_doc_id = self.offsets.partition_point(|&offset| offset > pos) as DocId - 1;

which should point at last entry with offset <= pos immediately without the second loop, shouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that method is rater new, but I had to flip the condition to make the partition work ([true, true, .., false, false, ..])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I am sorry, I got confused. Yes, you did flip the condition, but should the extra while loop be unnecessary now because you already get the last index which fulfils the original condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I forgot to remove that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are the mutable binding and the extra braces inside the closure left from the original implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @adamreichold

@PSeitz PSeitz force-pushed the test_fix branch 2 times, most recently from 8a59df8 to 15f21d7 Compare September 24, 2022 12:26
- 1u32;
// the offsets are strictly increasing so we can do a binary search on it.

let new_doc_id: DocId = self.offsets.partition_point(|&offset| offset <= pos) as DocId - 1; // Offsets start at 0, so -1 is safe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great comment!

@fulmicoton fulmicoton merged commit 21e0ade into main Sep 26, 2022
@fulmicoton fulmicoton deleted the test_fix branch September 26, 2022 00:42
This was referenced Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants