opt: add binary search in branch updater #587

gabriele-0201 · 2024-11-27T07:29:41Z

This is the first working version of the introduction of binary search within the branch_stage. I'm sure there is still a long way to go for further optimization. At the same time, I have been working on this for weeks and would be happy to receive feedback and use this as a starting point for all future optimizations.

gabriele-0201 · 2024-11-27T07:29:50Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

nomt/src/beatree/ops/update/branch_updater.rs

rphmeier · 2024-12-04T22:56:33Z

nomt/src/beatree/ops/update/branch_updater.rs

+        // in the node pointers
+        // 2. To avoid keeping all the update information within the BranchOp::KeepChunk because it would require
+        // further allocations
+        let apply_chunk =


I don't understand the underlying motivation for this piece of code. It looks like it adds a lot of complexity.

Was performance too slow with it? Is it causing a bug? It looks to me like we could just remove this entire piece of logic and save about 100LoC with dense conditionals. Can you elaborate on reason (2) stated here - all the original data for KeepChunk is stored within the base branch, so what causes additional allocations? The signature of push_chunk (base, start, end, ops) should work fine with that.

Our happy-path is [Chunk | Update | Chunk], or [Chunk | Update | Insert | Chunk]. I do see a case for doing dynamic merging, but the algorithms here need some more explanation, and there is also a good case against doing dynamic merging (unclear it would even be faster). And let's remove anything which isn't motivated by a measured performance deficit!

The underlying motivation is to make the first happy path even happier. And the motivation is literally what you said in a following comment:

swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy

The additional allocation state in reason 2 is due to the fact that somewhere the position of the update separator (alongside with its new page number) needs to be stored. The theoretical goal of BranchOp is to indicate whether there is a new item to insert or if a group of separators is being kept, with potential updates. Further allocations, as mentioned, would be needed to track which position has been updated within the retained branch. For example, something like KeepChunk(from, to, Vec<pos, new_pn>) could work, but it would require a Vec for each KeepChunk variant. Instead of that approach, I considered reconstructing the same information on-the-fly from just the two proposed BranchOp variants, off course with a trade-off: allocations for a function with dense conditionals.

Not using Update at all would make the signature push_chunk(base, start, end, ops) perfect (as implemented here), but in this case also page number updates are part of a push_chunk and thus we have the updated field

And let's remove anything which isn't motivated by a measured performance deficit!

Talking about this, here I did things in the wrong order. I had a technical problem benchmarking the code, so I first implemented the more complex feature and then benchmarked it (which proved to be slightly faster than not using the Update variant at all). What I should have done instead is propose an easier implementation first, followed by this 100LoC (probably even more in total) optimization.

nomt/src/beatree/ops/update/branch_updater.rs

rphmeier · 2024-12-04T23:06:10Z

nomt/src/beatree/ops/update/branch_updater.rs

-    Keep(usize, usize),
+    // Contains the position at which the separator is saved in the base,
+    // along with the updated page number
+    Update(usize, PageNumber),


LeafUpdater also has binary search but doesn't have the Update.

I definitely see the motivation for this (swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy).

LeafUpdater also has binary search but doesn't have the Update

That's totally right, and I think it could technically be added, but in that case, I don't think it would be worth it. The optimization takes place from the fact that the same bitwise_memcpy can be called on a bigger amount of bytes, assuming that the added computation is faster than calling bitwise_memcpy multiple times. I have the gut feeling that something like this could give a positive trade-off only for updated values that keep the same size. But this would require checks to make sure that the size is the same. Second, if the size changes, then a similar amount of byte shift would be required compared to what the current Insert and KeepChunk approach is doing.

Different from the branches, where the "values" are always 4-byte page numbers

nomt/src/beatree/ops/update/branch_updater.rs

rphmeier

This looks generally pretty good.

I'm wary of the creeping complexity emerging in the branch updater, which doesn't pair well with a lack of additional test coverage.

For example, we have now in build_branch pretty big conditional towers handling a lot of edge cases, and I doubt these are covered in full by our existing unit tests (integration tests notwithstanding).

I'm fine with merging this stack to preserve some velocity, but please follow up with a heavy pass for refactoring and additional testing. We need to devote attention to keep this code manageable and correct.

So while this code does solve the immediate issue of vastly reducing the number of comparisons, and so should be faster, I will note that conditionals metaphorically pour sugar in the gas tank of a modern pipelined CPU - I expect that the branch predictor will get fairly confused and this might add quite a lot of overhead. We will have to measure!

rphmeier · 2024-12-05T20:42:32Z

nomt/src/beatree/ops/update/branch_updater.rs

+                        BRANCH_NODE_BODY_SIZE,
+                    );
+
+                    if n_items == 0 {


I'm not clear why this is necessary; isn't split_keep_chunk meant to encapsulate this?

No, this could become clearer if the function were renamed try_split_keep_chunk, as it accepts both a target and a limit that cannot be exceeded. The function's sole purpose is to split a chunk, which could potentially fail. For example, if the first item in the chunk requires prefix compression of the new node to stop.

This logic is also present in bulk_split_step: attempt to split a chunk; if it fails, change the first element of KeepChunk to an Insert and repeat the loop, potentially leading to the scenario where left_gauge.stop_prefix_compression(); is reached.

nomt/src/beatree/ops/update/branch_updater.rs

rphmeier · 2024-12-05T21:29:32Z

nomt/src/beatree/ops/update/branch_updater.rs

+                                if self.gauge.body_size() < BRANCH_MERGE_THRESHOLD {
+                                    // We can stop prefix compression and separate the first
+                                    // element of the keep_chunk into its own.
+                                    self.ops[op_index] = BranchOp::KeepChunk(from + 1, to);


what about 0/1-sized chunks? Are they permitted? what are the invariants around KeepChunk?

0-sized chunks should never be created, but the line of code you just commented on, as well as the previous comment, are two examples of possible cases of 0-sized chunk creations. I will correct this behavior in a follow-up, along with an explanation of the invariant KeepChunk

rphmeier · 2024-12-10T23:13:52Z

Merge activity

Dec 10, 6:13 PM EST: A user started a stack merge that includes this pull request via Graphite.
Dec 10, 6:14 PM EST: A user merged this pull request with Graphite.

This was referenced Nov 27, 2024

opt: BranchNode::set_prefix uses bitwise_memcpy #569

Merged

opt: add branch builder push_chunk #585

Merged

fix: bitwise_memcpy keeps untouched bits #586

Merged

rphmeier reviewed Dec 4, 2024

View reviewed changes

nomt/src/beatree/ops/update/branch_updater.rs Outdated Show resolved Hide resolved

rphmeier reviewed Dec 4, 2024

View reviewed changes

nomt/src/beatree/ops/update/branch_updater.rs Show resolved Hide resolved

rphmeier reviewed Dec 4, 2024

View reviewed changes

nomt/src/beatree/ops/update/branch_updater.rs Outdated Show resolved Hide resolved

rphmeier reviewed Dec 4, 2024

View reviewed changes

rphmeier force-pushed the gm_fix_bit_ops_last_bytes_data branch from 8819a9f to cad7a8c Compare December 5, 2024 19:59

rphmeier force-pushed the gm_branch_simple_binary_search branch from 12c1e9f to 07243e4 Compare December 5, 2024 19:59

rphmeier reviewed Dec 5, 2024

View reviewed changes

rphmeier changed the base branch from gm_fix_bit_ops_last_bytes_data to graphite-base/587 December 5, 2024 20:46

rphmeier force-pushed the graphite-base/587 branch from cad7a8c to f8d56b8 Compare December 5, 2024 20:48

rphmeier force-pushed the gm_branch_simple_binary_search branch from 07243e4 to 4da7774 Compare December 5, 2024 20:48

rphmeier changed the base branch from graphite-base/587 to master December 5, 2024 20:49

rphmeier force-pushed the gm_branch_simple_binary_search branch from 4da7774 to c144190 Compare December 5, 2024 20:49

rphmeier reviewed Dec 5, 2024

View reviewed changes

nomt/src/beatree/ops/update/branch_updater.rs Show resolved Hide resolved

rphmeier reviewed Dec 5, 2024

View reviewed changes

rphmeier force-pushed the gm_branch_simple_binary_search branch from c144190 to d859565 Compare December 5, 2024 23:20

This was referenced Dec 5, 2024

opt: improve body_size_after to avoid O(n) logic #597

Merged

opt: improve BranchGauge::ingest_chunk to avoid O(n) logic #598

Merged

gabriele-0201 force-pushed the gm_branch_simple_binary_search branch from d859565 to 5c37f32 Compare December 9, 2024 10:14

opt: add binary search in branch updater

e116857

gabriele-0201 force-pushed the gm_branch_simple_binary_search branch from 5c37f32 to e116857 Compare December 9, 2024 10:17

rphmeier merged commit 1c4ca36 into master Dec 10, 2024
8 checks passed

rphmeier deleted the gm_branch_simple_binary_search branch December 10, 2024 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: add binary search in branch updater #587

opt: add binary search in branch updater #587

gabriele-0201 commented Nov 27, 2024

gabriele-0201 commented Nov 27, 2024 •

edited by rphmeier

Loading

rphmeier Dec 4, 2024 •

edited

Loading

gabriele-0201 Dec 9, 2024

rphmeier Dec 4, 2024

gabriele-0201 Dec 9, 2024

rphmeier left a comment •

edited

Loading

rphmeier Dec 5, 2024

gabriele-0201 Dec 9, 2024

rphmeier Dec 5, 2024

gabriele-0201 Dec 9, 2024

rphmeier commented Dec 10, 2024 •

edited

Loading

opt: add binary search in branch updater #587

opt: add binary search in branch updater #587

Conversation

gabriele-0201 commented Nov 27, 2024

gabriele-0201 commented Nov 27, 2024 • edited by rphmeier Loading

rphmeier Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

gabriele-0201 Dec 9, 2024

Choose a reason for hiding this comment

rphmeier Dec 4, 2024

Choose a reason for hiding this comment

gabriele-0201 Dec 9, 2024

Choose a reason for hiding this comment

rphmeier left a comment • edited Loading

Choose a reason for hiding this comment

rphmeier Dec 5, 2024

Choose a reason for hiding this comment

gabriele-0201 Dec 9, 2024

Choose a reason for hiding this comment

rphmeier Dec 5, 2024

Choose a reason for hiding this comment

gabriele-0201 Dec 9, 2024

Choose a reason for hiding this comment

rphmeier commented Dec 10, 2024 • edited Loading

Merge activity

gabriele-0201 commented Nov 27, 2024 •

edited by rphmeier

Loading

rphmeier Dec 4, 2024 •

edited

Loading

rphmeier left a comment •

edited

Loading

rphmeier commented Dec 10, 2024 •

edited

Loading