Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(log-store): support kv-based log store #9060

Merged
merged 60 commits into from
Jul 3, 2023
Merged

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Apr 9, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

In this PR, we implement a kv log store based on the serde implemented from #9451 and #10090.

First, we implement a KvLogStoreBuffer that acts as a channel between the log writer and reader. The writer writes data to the buffer, and the reader reads from the buffer and await on the buffer when it is empty. The buffer has a capacity. When the capacity has not been reached, the writer will write the stream chunk into the buffer. When the capacity has been reached, the stream chunk will be written to state store, and only the start and end seq id is added to the buffer. When the reader gets initialized with a latest epoch, the reader will read all data flushed before such epoch, and then read the data from the buffer.

Implement risingwavelabs/rfcs#55

🤖 Generated by Copilot at 539e472

This pull request refactors the log store logic in the stream module and implements a new kv log store based on a key-value state store. It also changes the visibility of some serde types and functions to be crate-private. It moves the in-memory log store logic to a separate module in_mem.rs and adds new modules for the kv log store components: buffer.rs, reader.rs, writer.rs, and mod.rs. It updates the imports of the BoundedInMemLogStoreFactory in the sink.rs and from_proto/sink.rs modules to reflect the new location.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@wenym1 wenym1 changed the title feat(storage): support kv-based log store feat(log-store): support kv-based log store Jun 19, 2023
@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jun 19, 2023
@wenym1 wenym1 marked this pull request as ready for review June 19, 2023 10:11
@tabVersion tabVersion self-requested a review June 19, 2023 10:22
@wenym1 wenym1 requested review from hzxa21, st1page and xx01cyx June 20, 2023 05:36
src/stream/src/common/log_store/kv_log_store/writer.rs Outdated Show resolved Hide resolved
src/stream/src/common/log_store/kv_log_store/reader.rs Outdated Show resolved Hide resolved
src/stream/src/common/log_store/kv_log_store/reader.rs Outdated Show resolved Hide resolved
state_store
.iter(
(Included(range_start), Excluded(range_end)),
u64::MAX,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be first_write_epoch (although the key_range can guarantee correctness)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a safe_epoch in hummock version. Reading below the safe epoch may return an error even though it is correct. Since we don't pin the epoch in log reader, the safe epoch will bump up to above the first write epoch, and then read from the first write epoch may return an error.

Comment on lines +179 to +180
// Use u64::MAX here because the epoch to consume may be below the safe
// epoch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By safe epoch, do you mean the compaction watermark? I don't think this is an issue because all keys written to log store are unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above.

.await?;
return Ok((epoch, LogStoreReadItem::StreamChunk(stream_chunk)));
}
LogStoreBufferItem::Barrier {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that barrier will keep appending to the in-mem buffer even when sink is lagging behind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

In future PR, we can try merging the data of multiple epochs if the sink is lagging too behind.

@wenym1 wenym1 removed the user-facing-changes Contains changes that are visible to users label Jun 28, 2023
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM

@wenym1 wenym1 enabled auto-merge June 29, 2023 03:37
@wenym1 wenym1 added this pull request to the merge queue Jun 29, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 29, 2023
@tabVersion
Copy link
Contributor

also please attach the design doc in pr description

@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jun 29, 2023
@wenym1 wenym1 enabled auto-merge June 29, 2023 07:56
@wenym1 wenym1 added this pull request to the merge queue Jul 3, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 3, 2023
@wenym1 wenym1 added this pull request to the merge queue Jul 3, 2023
Merged via the queue into main with commit 2ca184a Jul 3, 2023
@wenym1 wenym1 deleted the yiming/kv-log-store branch July 3, 2023 06:10
@CharlieSYH CharlieSYH added the 📖✗ No user documentation is needed. label Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users 📖✗ No user documentation is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants