Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: use fragvec to hold replaced segments #16160

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

dotnwat
Copy link
Member

@dotnwat dotnwat commented Jan 19, 2024

Observed a large 1 to 2 MB allocation due to 50K plus segments in the replaced segments field of the manifest.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Bug Fixes

  • Fix large allocation in partition manifest.

Observed a large 1 to 2 MB allocation due to 50K plus segments in the
replaced segments field of the manifest.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
@dotnwat dotnwat requested a review from Lazin January 19, 2024 06:45
@dotnwat dotnwat requested review from andrwng and abhijat January 19, 2024 06:45
Comment on lines +1498 to +1499
const fragmented_vector<lw_segment_meta> source_backlog
= _manifest->lw_replaced_segments();
Copy link
Contributor

@abhijat abhijat Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since lw_replaced_segments now returns fragmented vector, should we use const auto here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1504 to +1529
const auto backlog_size = source_backlog.size();
fragmented_vector<lw_segment_meta> backlog;
for (const auto& m : source_backlog) {
auto it = _manifest->find(m.base_offset);
if (it == _manifest->end()) {
backlog.push_back(m);
continue;
}
auto m_name = _manifest->generate_remote_segment_name(
cloud_storage::partition_manifest::lw_segment_meta::convert(m));
auto s_name = _manifest->generate_remote_segment_name(*it);
// The segment will have the same path as the one we have in
// manifest in S3 so if we will delete it the data will be lost.
if (m_name == s_name) {
vlog(
_logger.error,
"The replaced segment name {} collides with the segment "
"{} "
"in the manifest. It will be removed to prevent the data "
"loss.",
m_name,
s_name);
continue;
}
backlog.push_back(m);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: copy_if with a predicate seems like it would be easier to read as the predicate just has to evaluate to true or false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@abhijat abhijat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than a couple of nits

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 19, 2024

@piyushredpanda piyushredpanda added this to the v23.3.3 milestone Jan 19, 2024
@piyushredpanda
Copy link
Contributor

/ci-repeat

@dotnwat
Copy link
Member Author

dotnwat commented Jan 19, 2024

@Lazin @abhijat good call outs on the review. as soon as this merges ill post another PR with your feedback since we are trying to get this out the door. i didn't think we'd actually approve and merge this PR as a fix 🤣

@dotnwat
Copy link
Member Author

dotnwat commented Jan 19, 2024

Looks like this is taking so long because BK waited an hour to get a worker machine.

@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/43958#018d22af-1fa1-4b4d-b74a-e78cdae56f80:

"rptest.tests.internal_topic_protection_test.InternalTopicProtectionLargeClusterTest.test_schemas_topic"

@dotnwat
Copy link
Member Author

dotnwat commented Jan 19, 2024

The DT failure is a duplicate of #14149 which is an issue that pops up at shutdown.

@dotnwat dotnwat merged commit ed4f529 into redpanda-data:dev Jan 19, 2024
16 of 18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16160-v23.2.x-45 remotes/upstream/v23.2.x
git cherry-pick -x 33a5e48923c10a58db61023e99be21290157496d

Workflow run logs.

@dotnwat
Copy link
Member Author

dotnwat commented Jan 19, 2024

#16189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants