-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data corruption #3509
Comments
Based on the log file, the corrupted file has been successfully compacted without a checksum error:
It means that the file was OK at least when generated. It it theoretically possible that somehow RocksDB forget to call fsync() or failed to see fsync() failure message, but to me it is unlikely. We just fixed a bug related to that in 5.6 and added unit tests. The version you run (5.8.8) is relatively well covered in this aspect. How did you copy the data to parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/ directory? Is there possible that the copying have some problem? |
Thank you for looking into this. The database has been populated by syncing the Ethereum blockchain, so the data wasn't really copied as a whole. I've looked at the previous corruption issues (based on |
@andresilva what does "populated by syncing the Ethereum blockchain" mean? Do you mean data is all generated in place in parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/ and is never moved? |
Sorry for being vague in my description. Yes, I meant that all the data is generated in place. It is possible that some users may move their data folder (and yes this could corrupt the data), but under normal operation data is never moved and is only interacted with through RocksDB API (no files are touched), in this particular case for which I provided logs data wasn't moved. |
@andresilva I don't understand then. Like what I pasted above, the file 011705.sst was supposed to deleted at 1/31 21:16:
When did you see the corruption reported? But anyway, I saw some corruption reported in the end of the log file for another file. I will dig that one further. |
There isn't much information to dig further. If you run sst_dump against the file parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/004774.sst, what will be the output? |
I keep getting receiving this error, tried 3 different SSD disks (NVMe & SATA), one of them was completely new. Memtest86 says that memory is ok. |
An old bug but it still actual for me. I'm testing IOTA project. The RocksDb version used is quit old since it's version 6.3.6. Here is the stacktrace
Any clue ? |
@damageco in my case problem was in RAM, but memtest did not recognize it. I found it when manually removed each module while Parity wasn't not ok. |
It is hard to say what would cause this. It could possibly even be a storage issue with your hardware |
Since we have not had any followup data - closing this issue. |
Expected behavior
Don't cause data corruption.
Actual behavior
Corruption: block checksum mismatch: expected 3482696050, got 3888739091 in parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/011705.sst offset 35210665 size 16261
Steps to reproduce the behavior
Non-deterministic.
I’m one of the contributors of parity, one of the Ethereum clients. We use RocksDB to store all the client data and this is currently running on 5000+ nodes ranging from server-grade hardware to commodity hardware, but mostly using SSDs. Parity is implemented in Rust and so we use our own library to wrap RocksDB’s C API (https://github.com/paritytech/rust-rocksdb/). A couple of users have been having issues with data corruption, we’ve tried to the best of our extents to understand if this was any hardware error, some users are seeing the issue with different hard drives. We’ve recently fixed an issue where on shutdown we wouldn’t properly close RocksDB, but we’re still seeing users report corruption issues after a fix was rolled out. So far we’ve been unable to reproduce any of these issues ourselves so we’re relying on the data and feedback that our users can provide us.
Attached you can find a RocksDB log provided by one of our users that shows the corruption issue.
Is there anything we should be watching for that could be triggering this corruption, or should we just try to rule out faulty hardware?
rocksdb005.log
The text was updated successfully, but these errors were encountered: