-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): Introduces a split implementation based on split_key to replace the previous move implementation. #18594
Conversation
…nto li0k/storage_split_group_v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the efforts!
src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs
Outdated
Show resolved
Hide resolved
src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs
Outdated
Show resolved
Hide resolved
|
||
// `partition_vnode_count` only works inside a table, to avoid a lot of slicing sst, we only enable it in groups with high throughput and only one table. | ||
// The target `table_ids` might be split to an existing group, so we need to try to update its config | ||
if table_ids.len() == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this logic for all CG with single table? If yes, only checking the provided table_ids
is not enough. Consider the following example:
original cg table_ids = [1, 2, 3]
split table_ids = [2]
resulting cg1 table_ids = [1]
resulting cg2 table_ids = [2]
resulting cg3 table_ids = [3]
All the resulting CGs need this logic. I wonder whether we should do this inside split_compaction_group_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue if we do it outside of split_compaction_group_impl
: the compaction group txn will not be committed within the same transaction of version delta, which may cause in consistency when error happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required that all single table cgs must be partitioned, we just need to enable it for high throughput cgs, we could even consider deprecating this feature, but for the current PR we choose to keep it. Let me think about how to implement it atomically.
// split 2 | ||
// See the example in L603 and the split rule in `split_compaction_group_impl`. | ||
let result_vec = self | ||
.split_compaction_group_impl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make sure split1 and split2 happens atomically in the same txn? If not, what happen if split1 succeeds but split2 fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, from my point of view it doesn't need to be.
If split 1 succeeds and split 2 fails (split condition/network), we just wait for the next split to be triggered, if necessary.
src/meta/src/hummock/manager/compaction/compaction_group_schedule.rs
Outdated
Show resolved
Hide resolved
let split_key = if group_construct.split_key.is_some() { | ||
Some(Bytes::from(group_construct.split_key.clone().unwrap())) | ||
} else { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this branch reachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split_key: None, |
reachable when construct a new compaction group
…nto li0k/storage_split_group_v2
…nto li0k/storage_split_group_v2
…nto li0k/storage_split_group_v2
…nto li0k/storage_split_group_v2
|
||
let (key_range_l, key_range_r) = { | ||
let key_range = &origin_sst_info.key_range; | ||
let l = KeyRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of split_sst, the relative ordering of sst.left,sst.right and split_key cannot be assumed.
Maybe assert l.left < l.right and r.left<=r.right, or handle it properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. split_key = sst.left = sst.right (inclusive).
We don't want any resulted sst.left > sst.right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, refactor split_sst
and add related UT
…nto li0k/storage_split_group_v2
…nto li0k/storage_split_group_v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please check whether the PR description is accurate. For example, the split key now uses MAX instead of 0.
SstSplitType::Right => { | ||
assert_eq!(0, pos); | ||
insert_table_infos.extend_from_slice(&level.table_infos[pos..]); // the sst at pos has been split to the right | ||
level.table_infos = level.table_infos[0..pos].to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: can be level.table_infos.clear()
…nto li0k/storage_split_group_v2
…nto li0k/storage_split_group_v2
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Previously, we implemented the concept of compaction groups for hummock to achieve write/configuration isolation and to create new compaction groups via the table-level move function.
Since the sst key range generated by the move function is not contiguous, this makes it difficult to re-combine the resulting independent compaction groups (currently not supported). Because of this concern, we set the move threshold very high, fearing that the resulting compaciton group will not be able to be merged again, thus creating a large number of compaction groups.
This concern is contrary to our expectation that in scenarios with high write throughput (backfill, etc.), we would like to split the compaction to achieve parallel compaction, increase the overall compaction throughput, and make full use of the compactor node's scale capability.
Here are the changes involved in this PR for review.
table_id
level split function, compatible with the original Split interface.I'll summarise the changes to the file, for your code reivew:
There may be more code to change in this PR, but the logic is more homogeneous, thank you very much for your review!
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.