Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.3.x] archival: Start housekeeping jobs only on a leader #17917

Merged

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #17839
Fixes: #17916,

Lazin added 3 commits April 17, 2024 12:48
Currently, the housekeeping jobs are enabled in the 'start' method.
First the 'sync' method of the archival metadata stm is called. We don't
check the result of the 'sync' call which is a mistake. Normally, we're
calling 'start' method of the 'ntp_archiver' when the partition is
already a leader so this code works as expected. But if the partition is
moved to another shard we could potentially create and start archiver
when the partition is not a leader yet. In this case the 'sync' call
returns 'nullopt'. The housekeeping jobs are not enabled because
_parent.is_leader() returns 'false' in this case.

Then, when the partition becomes a leader the 'notify_leadership' method
is invoked. This method enables the housekeeping jobs. The problem is
that the 'sync' method of the archival STM may not be called yet. So the
adjacent_segment_merger starts reuploading segments based on stale
manifest.

The fix delays enablement of the housekeeping jobs until the background
loop is started and 'sync' is successfully called. The
'notify_leadership' method can enable or disable the jobs after that.

(cherry picked from commit 9c37251)
Log all headers in situation when we fail to parse them

(cherry picked from commit 426db07)
Increase the timeout and scrubbing frequency.

(cherry picked from commit 7747c48)
@vbotbuildovich vbotbuildovich added this to the v23.3.x-next milestone Apr 17, 2024
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Apr 17, 2024
@vbotbuildovich vbotbuildovich requested a review from Lazin April 17, 2024 12:48
@Lazin Lazin merged commit 1017b79 into redpanda-data:v23.3.x Apr 23, 2024
18 checks passed
@piyushredpanda piyushredpanda modified the milestones: v23.3.x-next, v23.3.13 Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants