-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compactions should not cause corruption #6435
Comments
This is the expected behavior when RocksDB background operation encounters an error: bring the DB into read-only mode. In read-only mode, any writes will return the same error that the background operation encountered. In this case it's a checksum error. It is always difficult (impossible?) to distinguish silent corruption in device from RocksDB corruption. Have you checked for firmware updates and the device health (e.g., with Also it'd be interesting to know if the checksum error is permanent or transient. One way to tell is restart the DB, wait until compaction happens, and see if it fails again. Or you can use a tool like |
I see.
smartctl:
We saw this error twice within a pretty short period after having the DB open for about 24 hours. We have also had reports of it on other hardware, so I would be surprised if it is a hardware issue. When I run the sst_dump tool, it shows the file is corrupted just as the log:
|
Thanks for all the info!
That is very good to know. |
Experienced the same thing regularly, two disks setup in software RAID1:
Complete output here. Going to try different disk to see if that helps. |
Would it possible to localize corrupted key range/sst file through client API? In case of replicated system it's possible to reinsert keys/ingest sst file from another replica and try to heal programmatically. |
Do you use direct IO or page cache IO? An article about fsync failures on several filesystems - https://www.usenix.org/system/files/atc20-rebello.pdf |
Expected behavior
When doing an insert, errors are seen from rocksdb and the values are not inserted:
In the log we see errors with corruption during a compaction.
Rocks version:
Actual behavior
Corruption messages and failures from insert.
Steps to reproduce the behavior
The text was updated successfully, but these errors were encountered: