Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-2321: cloud_storage: Fix remote_partition stuck reader #17805

Merged
merged 4 commits into from
Apr 11, 2024

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Apr 11, 2024

This PR adds new fuzz test that reproduces that stuck reader problem and adds a fix. The fix resets the reader in case it reached the max_offset.

Fixes #17788

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Bug Fixes

  • Fix problem in Tiered-Storage that could potentially cause consumers to get stuck

Lazin added 2 commits April 11, 2024 15:19
The 'partition_record_batch_reader_impl' uses internal
'remote_segment_batch_reader' to fetch data. When the internal reader
reaches the 'max_offset' it returns an empty result. The
'partition_record_batch_reader_impl::do_load_slice' also returns the
empty result in this case. Since the top level reader is invoked by the
'model::record_batch_reader'. It has an internal async loop which is
running until the EOS is reached. If its internal slice is not empty
it's invoking a user provided callback, then it checks if the stream
reached EOS (it exits if this is the case), and then it calls
'load_slice' of the underlying stream (this is the place where
'do_load_slice' method of the 'remote_segment_batch_reader' is invoked).

Since the low level segment reader reached max_offset it will always
return an empty value and the top level record batch reader will loop
forever.

This commit breaks this loop by checking the current offset of the
segment reader if the empty result is returned. If the current offset
overshoot max-offset it resets the low level stream. The top level loop
will exit after that.
Add fuzz test that generates data with a lot of tx-fence batches and
scans the entire partition reading one batch at a time by setting
max_bytes=1 and max_offset to 'reader_config.start_offset + 1'.
Lazin added 2 commits April 11, 2024 15:32
The exception is no longer thrown anywhere so it's safe to remove the
'catch' block.
@dotnwat dotnwat changed the title cloud_storage: Fix remote_partition stuck reader CORE-2321: cloud_storage: Fix remote_partition stuck reader Apr 11, 2024
Comment on lines 439 to 454
// Reader overshoot the offset. If we will not reset
// the stream the loop inside the
// record_batch_reader will keep calling this method
// again and again. We will be returning empty
// result every time because the current offset
// overshoot the max allowed offset. Resetting the
// segment reader fixes the issue.
//
// We can get into the situation when the current
// reader returns empty result in several cases:
// - we reached max_offset (covered here)
// - we reached end of stream (covered above right
// after the 'read_some' call)
//
// If we reached max-bytes then the result won't be
// empty. It will have at least one record batch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dotnwat dotnwat merged commit 4ff87bc into redpanda-data:dev Apr 11, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Comment on lines +838 to +847
if (maybe_max_segments) {
config::shard_local_cfg()
.cloud_storage_max_materialized_segments_per_shard.set_value(
maybe_max_segments);
}
if (maybe_max_readers) {
config::shard_local_cfg()
.cloud_storage_max_segment_readers_per_shard.set_value(
maybe_max_readers);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe they should be the scoped_config, to autoreset at the end of the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but it doesn't matter that much for fuzz tests
but agree, we should do this eventually

Comment on lines +881 to +882
auto headers_read
= reader.consume(test_consumer(), model::no_timeout).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the point where it could get stuck, without this fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud_storage: The remote partition reader can get stuck
4 participants