Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add binary search in branch updater #587

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

gabriele-0201
Copy link
Contributor

This is the first working version of the introduction of binary search within the branch_stage. I'm sure there is still a long way to go for further optimization. At the same time, I have been working on this for weeks and would be happy to receive feedback and use this as a starting point for all future optimizations.

// in the node pointers
// 2. To avoid keeping all the update information within the BranchOp::KeepChunk because it would require
// further allocations
let apply_chunk =
Copy link
Contributor

@rphmeier rphmeier Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the underlying motivation for this piece of code. It looks like it adds a lot of complexity.

Was performance too slow with it? Is it causing a bug? It looks to me like we could just remove this entire piece of logic and save about 100LoC with dense conditionals. Can you elaborate on reason (2) stated here - all the original data for KeepChunk is stored within the base branch, so what causes additional allocations? The signature of push_chunk (base, start, end, ops) should work fine with that.

Our happy-path is [Chunk | Update | Chunk], or [Chunk | Update | Insert | Chunk]. I do see a case for doing dynamic merging, but the algorithms here need some more explanation, and there is also a good case against doing dynamic merging (unclear it would even be faster). And let's remove anything which isn't motivated by a measured performance deficit!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying motivation is to make the first happy path even happier. And the motivation is literally what you said in a following comment:

swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy

The additional allocation state in reason 2 is due to the fact that somewhere the position of the update separator (alongside with its new page number) needs to be stored. The theoretical goal of BranchOp is to indicate whether there is a new item to insert or if a group of separators is being kept, with potential updates. Further allocations, as mentioned, would be needed to track which position has been updated within the retained branch. For example, something like KeepChunk(from, to, Vec<pos, new_pn>) could work, but it would require a Vec for each KeepChunk variant. Instead of that approach, I considered reconstructing the same information on-the-fly from just the two proposed BranchOp variants, off course with a trade-off: allocations for a function with dense conditionals.

Not using Update at all would make the signature push_chunk(base, start, end, ops) perfect (as implemented here), but in this case also page number updates are part of a push_chunk and thus we have the updated field

And let's remove anything which isn't motivated by a measured performance deficit!

Talking about this, here I did things in the wrong order. I had a technical problem benchmarking the code, so I first implemented the more complex feature and then benchmarked it (which proved to be slightly faster than not using the Update variant at all). What I should have done instead is propose an easier implementation first, followed by this 100LoC (probably even more in total) optimization.

Keep(usize, usize),
// Contains the position at which the separator is saved in the base,
// along with the updated page number
Update(usize, PageNumber),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeafUpdater also has binary search but doesn't have the Update.

I definitely see the motivation for this (swapping page numbers should be cheap, if the key doesn't change, and the separators can be completely copied with a normal memcpy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeafUpdater also has binary search but doesn't have the Update

That's totally right, and I think it could technically be added, but in that case, I don't think it would be worth it. The optimization takes place from the fact that the same bitwise_memcpy can be called on a bigger amount of bytes, assuming that the added computation is faster than calling bitwise_memcpy multiple times. I have the gut feeling that something like this could give a positive trade-off only for updated values that keep the same size. But this would require checks to make sure that the size is the same. Second, if the size changes, then a similar amount of byte shift would be required compared to what the current Insert and KeepChunk approach is doing.

Different from the branches, where the "values" are always 4-byte page numbers

Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally pretty good.

I'm wary of the creeping complexity emerging in the branch updater, which doesn't pair well with a lack of additional test coverage.

For example, we have now in build_branch pretty big conditional towers handling a lot of edge cases, and I doubt these are covered in full by our existing unit tests (integration tests notwithstanding).

I'm fine with merging this stack to preserve some velocity, but please follow up with a heavy pass for refactoring and additional testing. We need to devote attention to keep this code manageable and correct.

So while this code does solve the immediate issue of vastly reducing the number of comparisons, and so should be faster, I will note that conditionals metaphorically pour sugar in the gas tank of a modern pipelined CPU - I expect that the branch predictor will get fairly confused and this might add quite a lot of overhead. We will have to measure!

@rphmeier rphmeier force-pushed the gm_fix_bit_ops_last_bytes_data branch from 8819a9f to cad7a8c Compare December 5, 2024 19:59
@rphmeier rphmeier force-pushed the gm_branch_simple_binary_search branch from 12c1e9f to 07243e4 Compare December 5, 2024 19:59
BRANCH_NODE_BODY_SIZE,
);

if n_items == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear why this is necessary; isn't split_keep_chunk meant to encapsulate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this could become clearer if the function were renamed try_split_keep_chunk, as it accepts both a target and a limit that cannot be exceeded. The function's sole purpose is to split a chunk, which could potentially fail. For example, if the first item in the chunk requires prefix compression of the new node to stop.

This logic is also present in bulk_split_step: attempt to split a chunk; if it fails, change the first element of KeepChunk to an Insert and repeat the loop, potentially leading to the scenario where left_gauge.stop_prefix_compression(); is reached.

@rphmeier rphmeier changed the base branch from gm_fix_bit_ops_last_bytes_data to graphite-base/587 December 5, 2024 20:46
@rphmeier rphmeier force-pushed the gm_branch_simple_binary_search branch from 07243e4 to 4da7774 Compare December 5, 2024 20:48
@rphmeier rphmeier changed the base branch from graphite-base/587 to master December 5, 2024 20:49
@rphmeier rphmeier force-pushed the gm_branch_simple_binary_search branch from 4da7774 to c144190 Compare December 5, 2024 20:49
if self.gauge.body_size() < BRANCH_MERGE_THRESHOLD {
// We can stop prefix compression and separate the first
// element of the keep_chunk into its own.
self.ops[op_index] = BranchOp::KeepChunk(from + 1, to);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about 0/1-sized chunks? Are they permitted? what are the invariants around KeepChunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0-sized chunks should never be created, but the line of code you just commented on, as well as the previous comment, are two examples of possible cases of 0-sized chunk creations. I will correct this behavior in a follow-up, along with an explanation of the invariant KeepChunk

@gabriele-0201 gabriele-0201 force-pushed the gm_branch_simple_binary_search branch from 5c37f32 to e116857 Compare December 9, 2024 10:17
Copy link
Contributor

rphmeier commented Dec 10, 2024

Merge activity

  • Dec 10, 6:13 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Dec 10, 6:14 PM EST: A user merged this pull request with Graphite.

@rphmeier rphmeier merged commit 1c4ca36 into master Dec 10, 2024
8 checks passed
@rphmeier rphmeier deleted the gm_branch_simple_binary_search branch December 10, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants