-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Point-in-time recovery should fail on IO error when reading WAL #6288
Comments
To be honest I am a bit lost on what is expected from point-in-time recovery lately. Wouldn't recovering no WAL data at all be a valid recovery to some point-in-time? The other recent time I'm confused was #6351. Why is that feature incompatible with |
IMO, point-in-time recovery only guarantees to recover to a consistent point, any point is acceptable. So the behavior @yiwu-arbug describes is also acceptable. |
there's the loose guarantee of point-in-time recovery where it can recover to any consistent point, but there's also the guarantee of sync wal that data is persistent after that point. The latter cannot be guaranteed if there's corruption on disk, but IMO we should make that promise as far as possible. |
That sounds good to me. cc @yhchiang who is our visiting expert in power loss recovery! |
Assigning to me to track it. |
Thanks @yiwu-arbug for flagging this. I am working on a fix. Not sure if my understanding is accurate. In point-in-time recovery,
|
Or is the reasoning as follows: |
Good questions. I thought about them for a while today. Here is my analysis which I believe supports the original plan.
|
Make sense and thanks for the insights. As you said, the crucial assumptions are:
|
Are we planning to improve this? Should we add an "after sync" marker to the WAL (new record type) along with the next appended data, so that everything before such a marker must checksum (else must be data loss)? |
I think the whole sentence needs to be quoted for that claim to be true (i.e., also "or corruption of synced data"). Unsynced data should only cause corruption in the unsynced tail of entries. It'd be great though if we can improve this to know where the synced portion ends and unsynced portion begins, so we can fail |
Maybe we can store the expected length of a sync interval at its start? I think in either case of sync -- Actually the logic I'm describing is the same logic |
Better, yes. If that works, anything after should indicate the sync completed. |
Now, when Maybe in the future, we can track the last synced size of the WAL, so that if the recovery error happens before the synced size, then we report error. |
I believe this is resolved by #6963 and WAL tracking in MANIFEST feature. Closing. cc @yiwu-arbug feel free to reopen if you see fit. |
Expected behavior
With point-in-time recovery, if it encounter IO error when reading WAL, it could be due to intermittent error (a loose cable?). RocksDB should fail to open in this case. Truncating the WAL could result in data loss (even when the WAL is sync-ed previously).
Actual behavior
rocksdb will treat the error as a corruption in WAL, stop replaying the rest of WAL and open successfully.
Steps to reproduce the behavior
Only by reading the code.
The text was updated successfully, but these errors were encountered: