-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v23.3.x] storage: fix windowed compaction error handling #16940
[v23.3.x] storage: fix windowed compaction error handling #16940
Conversation
to be used by the next commit.
The staging file was missing a '.'. Note that these files are transient, so there isn't a compactibility concern here. (cherry picked from commit 62fd4f8)
CONFLICT: - kept default value for segment_last_offset in compaction_reducers An upcoming test will inject an exception to simulate a failure case seen in the wild that is otherwise hard to reproduce. I'd considered using the file_sanitizer-based error injection, but it seems that infra only really shines at making Redpanda hit asserts, so it's not quite what I'm looking for (my upcoming test will assert that we _don't_ crash in a particular error scenario). (cherry picked from commit 13f5537)
This will be useful to test methods that need a feature table (e.g. deduplicate_segments() has a feature table as an argument). (cherry picked from commit 958aaf7)
We were previously using the l-value-variant of model::record_batch_reader::consume(), which doesn't have a built-in call to finally(). For a log_reader, this meant that in some scenarios (e.g. if the reducer were to fail) we could end up not calling finally(), and therefore wouldn't clear the underlying segment reader. This is exactly what happened, as we hit: std::runtime_error (lz4f_compressend:ERROR_dstMaxSize_tooSmall) errors as we compressed/decompressed batches, and then ultimately crashed because the log_reader was not closed: Assert failure: (.../redpanda/redpanda/src/v/storage/log_reader.h:141) '!_iterator.reader' log reader destroyed with live reader" (cherry picked from commit cc06447)
CI failures: |
return with_extension("log.compaction.compaction_index"); | ||
return with_extension(".log.compaction.compaction_index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the affect the ability of storage to clean-up the transient files? i'm not sure what kind of wildcard / pattern it uses to identify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would if we cleaned up leftover files at start up, but I don't think we do currently. We only clean them up at runtime after we compact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge? If so, please do. It's a thrill :D
Backport of #16928
Fixes #16937
Also pulls in 5ad9e14
CONFLICT:
We were previously using the l-value-variant of
model::record_batch_reader::consume(), which doesn't have a built-in
call to finally(). For a log_reader, this meant that in some scenarios
(e.g. if the reducer were to fail) we could end up not calling
finally(), and therefore wouldn't clear the underlying segment reader.
This is exactly what happened, as we hit:
std::runtime_error (lz4f_compressend:ERROR_dstMaxSize_tooSmall)
errors as we compressed/decompressed batches, and then ultimately
crashed because the log_reader was not closed:
Assert failure: (.../redpanda/redpanda/src/v/storage/log_reader.h:141) '!_iterator.reader' log reader destroyed with live reader"
Backports Required
Release Notes
Bug Fixes