Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use binary search instead of linear for get_val in merge #1548

Merged
merged 2 commits into from
Sep 26, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions src/indexer/sorted_doc_id_multivalue_column.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,9 @@ impl<'a> SortedDocIdMultiValueColumn<'a> {
impl<'a> Column for SortedDocIdMultiValueColumn<'a> {
fn get_val(&self, pos: u64) -> u64 {
// use the offsets index to find the doc_id which will contain the position.
// the offsets are strictly increasing so we can do a simple search on it.
let new_doc_id: DocId = self
.offsets
.iter()
.position(|&offset| offset > pos)
.expect("pos is out of bounds") as DocId
- 1u32;
// the offsets are strictly increasing so we can do a binary search on it.

let new_doc_id: DocId = self.offsets.partition_point(|&offset| offset <= pos) as DocId - 1; // Offsets start at 0, so -1 is safe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great comment!


// now we need to find the position of `pos` in the multivalued bucket
let num_pos_covered_until_now = self.offsets[new_doc_id as usize];
Expand Down