-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add binary search in branch updater #587
Conversation
// in the node pointers | ||
// 2. To avoid keeping all the update information within the BranchOp::KeepChunk because it would require | ||
// further allocations | ||
let apply_chunk = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the underlying motivation for this piece of code. It looks like it adds a lot of complexity.
Was performance too slow with it? Is it causing a bug? It looks to me like we could just remove this entire piece of logic and save about 100LoC with dense conditionals. Can you elaborate on reason (2) stated here - all the original data for KeepChunk
is stored within the base branch, so what causes additional allocations? The signature of push_chunk (base, start, end, ops) should work fine with that.
Our happy-path is [Chunk | Update | Chunk], or [Chunk | Update | Insert | Chunk]. I do see a case for doing dynamic merging, but the algorithms here need some more explanation, and there is also a good case against doing dynamic merging (unclear it would even be faster). And let's remove anything which isn't motivated by a measured performance deficit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying motivation is to make the first happy path even happier. And the motivation is literally what you said in a following comment:
swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy
The additional allocation state in reason 2 is due to the fact that somewhere the position of the update separator (alongside with its new page number) needs to be stored. The theoretical goal of BranchOp
is to indicate whether there is a new item to insert or if a group of separators is being kept, with potential updates. Further allocations, as mentioned, would be needed to track which position has been updated within the retained branch. For example, something like KeepChunk(from, to, Vec<pos, new_pn>)
could work, but it would require a Vec
for each KeepChunk
variant. Instead of that approach, I considered reconstructing the same information on-the-fly from just the two proposed BranchOp
variants, off course with a trade-off: allocations for a function with dense conditionals.
Not using Update
at all would make the signature push_chunk(base, start, end, ops)
perfect (as implemented here), but in this case also page number updates are part of a push_chunk
and thus we have the updated
field
And let's remove anything which isn't motivated by a measured performance deficit!
Talking about this, here I did things in the wrong order. I had a technical problem benchmarking the code, so I first implemented the more complex feature and then benchmarked it (which proved to be slightly faster than not using the Update
variant at all). What I should have done instead is propose an easier implementation first, followed by this 100LoC (probably even more in total) optimization.
Keep(usize, usize), | ||
// Contains the position at which the separator is saved in the base, | ||
// along with the updated page number | ||
Update(usize, PageNumber), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeafUpdater also has binary search but doesn't have the Update
.
I definitely see the motivation for this (swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeafUpdater also has binary search but doesn't have the Update
That's totally right, and I think it could technically be added, but in that case, I don't think it would be worth it. The optimization takes place from the fact that the same bitwise_memcpy
can be called on a bigger amount of bytes, assuming that the added computation is faster than calling bitwise_memcpy
multiple times. I have the gut feeling that something like this could give a positive trade-off only for updated values that keep the same size. But this would require checks to make sure that the size is the same. Second, if the size changes, then a similar amount of byte shift would be required compared to what the current Insert
and KeepChunk
approach is doing.
Different from the branches, where the "values" are always 4-byte page numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally pretty good.
I'm wary of the creeping complexity emerging in the branch updater, which doesn't pair well with a lack of additional test coverage.
For example, we have now in build_branch
pretty big conditional towers handling a lot of edge cases, and I doubt these are covered in full by our existing unit tests (integration tests notwithstanding).
I'm fine with merging this stack to preserve some velocity, but please follow up with a heavy pass for refactoring and additional testing. We need to devote attention to keep this code manageable and correct.
So while this code does solve the immediate issue of vastly reducing the number of comparisons, and so should be faster, I will note that conditionals metaphorically pour sugar in the gas tank of a modern pipelined CPU - I expect that the branch predictor will get fairly confused and this might add quite a lot of overhead. We will have to measure!
8819a9f
to
cad7a8c
Compare
12c1e9f
to
07243e4
Compare
BRANCH_NODE_BODY_SIZE, | ||
); | ||
|
||
if n_items == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear why this is necessary; isn't split_keep_chunk
meant to encapsulate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this could become clearer if the function were renamed try_split_keep_chunk
, as it accepts both a target and a limit that cannot be exceeded. The function's sole purpose is to split a chunk, which could potentially fail. For example, if the first item in the chunk requires prefix compression of the new node to stop.
This logic is also present in bulk_split_step
: attempt to split a chunk; if it fails, change the first element of KeepChunk
to an Insert
and repeat the loop, potentially leading to the scenario where left_gauge.stop_prefix_compression();
is reached.
cad7a8c
to
f8d56b8
Compare
07243e4
to
4da7774
Compare
4da7774
to
c144190
Compare
if self.gauge.body_size() < BRANCH_MERGE_THRESHOLD { | ||
// We can stop prefix compression and separate the first | ||
// element of the keep_chunk into its own. | ||
self.ops[op_index] = BranchOp::KeepChunk(from + 1, to); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about 0/1-sized chunks? Are they permitted? what are the invariants around KeepChunk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0-sized chunks should never be created, but the line of code you just commented on, as well as the previous comment, are two examples of possible cases of 0-sized chunk creations. I will correct this behavior in a follow-up, along with an explanation of the invariant KeepChunk
c144190
to
d859565
Compare
d859565
to
5c37f32
Compare
5c37f32
to
e116857
Compare
This is the first working version of the introduction of binary search within the branch_stage. I'm sure there is still a long way to go for further optimization. At the same time, I have been working on this for weeks and would be happy to receive feedback and use this as a starting point for all future optimizations.