Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source): throttling source based on storage stats #8049

Closed
wants to merge 7 commits into from

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Feb 20, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Implement #7997. There will be two types of source throttlers:

  • StateStore throttler, which is a new one based on hummock storage stats.
  • MaxWaitBarrier throttler, which is a wrapper of original barrier latency based one.

As before

  • we only try to pause source when receiving a stream chunk.
  • we only try to resume source when receiving a barrier.

Note that barrier_interval_ms will be a mutable system param (not implemented yet). So MaxWaitBarrier throttler should be updated as well at that time.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #8049 (1d9ee28) into main (4453ae1) will decrease coverage by 0.03%.
The diff coverage is 20.20%.

@@            Coverage Diff             @@
##             main    #8049      +/-   ##
==========================================
- Coverage   71.63%   71.60%   -0.03%     
==========================================
  Files        1133     1134       +1     
  Lines      182211   182302      +91     
==========================================
+ Hits       130530   130544      +14     
- Misses      51681    51758      +77     
Flag Coverage Δ
rust 71.60% <20.20%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/lib.rs 0.81% <0.00%> (-0.01%) ⬇️
...torage/src/hummock/local_version/pinned_version.rs 87.11% <0.00%> (-1.64%) ⬇️
...c/stream/src/executor/source/fs_source_executor.rs 0.00% <0.00%> (ø)
src/stream/src/executor/source/mod.rs 0.00% <ø> (ø)
src/stream/src/executor/source/throttler.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/source.rs 0.00% <0.00%> (ø)
src/stream/src/task/stream_manager.rs 1.78% <ø> (ø)
src/storage/src/hummock/mod.rs 72.63% <8.33%> (-2.62%) ⬇️
src/stream/src/executor/source/source_executor.rs 85.23% <28.57%> (-1.96%) ⬇️
src/common/src/config.rs 90.30% <100.00%> (+0.12%) ⬆️
... and 17 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@soundOfDestiny soundOfDestiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@xx01cyx xx01cyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xx01cyx xx01cyx requested a review from tabVersion February 20, 2023 08:28
@zwang28

This comment was marked as outdated.

@xx01cyx
Copy link
Contributor

xx01cyx commented Feb 20, 2023

IIUC, the semantics here indicates that we're not accepting any data into the stream. However, apart from the source executor, the DML executor can also yield data chunks into the stream. pause_source simply pauses the source, but not DML from the user. This should be handled in case a batch-intensive use case, but I think pausing the source is enough for now. 🤔

@xx01cyx
Copy link
Contributor

xx01cyx commented Feb 20, 2023

IIUC, the semantics here indicates that we're not accepting any data into the stream. However, apart from the source executor, the DML executor can also yield data chunks into the stream. pause_source simply pauses the source, but not DML from the user. This should be handled in case a batch-intensive use case, but I think pausing the source is enough for now. 🤔

BTW we should not use Barrier::Pause to handle this because it is designed only for scaling now. cc. @BugenZhao

@zwang28 zwang28 requested a review from xx01cyx February 20, 2023 10:20
Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically LGTM.
please add some log about why to throttle when throttler pauses a stream and resumes it

Comment on lines 346 to 348
for throttler in &mut self.throttlers {
throttler.on_barrier();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need multiple throttler in source exec

SourceThrottlerImpl::MaxWaitBarrier(inner) => inner.should_pause(),
SourceThrottlerImpl::StateStore(inner) => {
if let Some(hummock) = inner.as_hummock() {
return hummock.need_write_throttling();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any hint on when hummock needs throttle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any hint on when hummock needs throttle?

when ingest_batch finds that there are too many L0 SSTs

@@ -271,6 +274,18 @@ impl HummockStorage {
pub fn get_pinned_version(&self) -> PinnedVersion {
self.pinned_version.load().deref().deref().clone()
}

pub fn need_write_throttling(&self) -> bool {
Copy link
Contributor

@Li0k Li0k Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to distinguish between different groups of l0 ? , I think their write paths are independent

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! LGTM!

.values()
.any(|levels| {
levels.l0.as_ref().unwrap().sub_levels.len()
> self.options.throttle_l0_sub_level_number as usize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we explain here the problem of deadlock when throttle_l0_sub_level_number < level0_tier_compact_file_number?

// we can guarantee the source is not paused since it received stream
// chunks.
self_paused = true;
if !stream.paused() && self.throttlers.iter().any(|t| t.should_pause()) {
stream.pause_source();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add warn! logging to inform us/users that streaming is paused due to accumulated L0 SST files.

@xx01cyx
Copy link
Contributor

xx01cyx commented Feb 23, 2023

IIUC, the semantics here indicates that we're not accepting any data into the stream. However, apart from the source executor, the DML executor can also yield data chunks into the stream. pause_source simply pauses the source, but not DML from the user. This should be handled in case a batch-intensive use case, but I think pausing the source is enough for now. 🤔

The DML executor is able to pause now via #8110. It pauses and resumes in a similar way as the source executor, so the new throttle can be implemented on the DML executor without much pain.

@zwang28
Copy link
Contributor Author

zwang28 commented Feb 24, 2023

The DML executor is able to pause now via #8110.

Cool. BTW why there is no barrier latency based throttler in DML executor, just like source executor? @xx01cyx

# Conflicts:
#	src/common/src/config.rs
#	src/meta/src/lib.rs
#	src/meta/src/manager/env.rs
#	src/rpc_client/src/meta_client.rs
#	src/stream/src/executor/source/fs_source_executor.rs
#	src/stream/src/executor/source/source_executor.rs
#	src/stream/src/executor/stream_reader.rs
@xx01cyx
Copy link
Contributor

xx01cyx commented Feb 24, 2023

The DML executor is able to pause now via #8110.

Cool. BTW why there is no barrier latency based throttler in the DML executor, just like source executor? @xx01cyx

We need this throttle on the source executor to control the data stream from external connectors, based on which we could gain control over the whole graph. BTW, as another form of data source, maybe the DML executor requires a similar throttling mechanism on user data. cc. @BugenZhao @tabVersion

@BugenZhao
Copy link
Member

maybe the DML executor requires a similar throttling mechanism on user data

Let's assume that DML won't have a too large throughput now. Basic back-pressure should work well for this case.

@zwang28
Copy link
Contributor Author

zwang28 commented Mar 1, 2023

We have decided to stall locally (ingest_batch), instead of pausing source.

@zwang28 zwang28 closed this Mar 1, 2023
@zwang28 zwang28 deleted the wangzheng/source_throttler branch April 20, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants