Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Corruption: block checksum mismatch / during sync. #7766

Closed
thebalaa opened this issue Jan 31, 2018 · 17 comments
Closed

Corruption: block checksum mismatch / during sync. #7766

thebalaa opened this issue Jan 31, 2018 · 17 comments
Labels
M4-core ⛓ Core client code / Rust. Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate.

Comments

@thebalaa
Copy link

I'm running:

  • Which Parity version?: 1.8.7 / 1.9.0
  • Which operating system?: Linux
  • How installed?: Installer
  • Are you fully synchronized?: no
  • Which network are you connected to?: ethereum
  • Did you try to restart the node?: yes

Starting from a clean slate latest stable and unstable versions 1.8.7 / 1.9.0 the following error is occuring at different block heights? Could this be faulty hardware?

2018-01-31 00:37:09  DB corrupted: Corruption: block checksum mismatch: expected 253734433, got 2018439782  in /home/balaa/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/282633.sst offset 11034697 size 596098. Repair will be triggered on next restart

====================

stack backtrace:
   0:     0x5617bafb2e0c - <no info>

Thread 'IO Worker #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch: expected 253734433, got 2018439782  in /home/balaa/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/282633.sst offset 11034697 size 596098"', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

    https://github.com/paritytech/parity/issues/new

Aborted (core dumped)```
@5chdn 5chdn added F2-bug 🐞 The client fails to follow expected behavior. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. M4-core ⛓ Core client code / Rust. labels Jan 31, 2018
@5chdn 5chdn added this to the 1.10 milestone Jan 31, 2018
@5chdn
Copy link
Contributor

5chdn commented Jan 31, 2018

cc @andresilva this happens during sync

follow-up on #7334 cc @DeviateFish-2

also #7748

@DeviateFish-2
Copy link

DeviateFish-2 commented Feb 1, 2018

Another sample for the pile (running v1.9.0):

...
2018-01-29 22:38:23  Syncing #1464212 2180…d05a   319 blk/s 1816 tx/s  57 Mgas/s    142+ 4947 Qed  #1469311   22/25 peers     77 MiB chain   54 MiB db   42 MiB queue    8 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:33  Syncing #1465618 ad8b…9110   140 blk/s 1268 tx/s  83 Mgas/s   1126+ 5452 Qed  #1472200   22/25 peers     74 MiB chain   54 MiB db   43 MiB queue    9 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:43  Syncing #1468641 5d43…260c   302 blk/s 1736 tx/s  59 Mgas/s      0+ 3749 Qed  #1472391   22/25 peers     53 MiB chain   54 MiB db   28 MiB queue   13 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2018-01-29 22:38:53  Syncing #1470915 640a…7b17   228 blk/s 1617 tx/s  45 Mgas/s    780+ 5759 Qed  #1477472   21/25 peers     73 MiB chain   54 MiB db   41 MiB queue    8 MiB sync  RPC:  0 conn,  0 req/s,   0 µs

====================

stack backtrace:
   0:     0x55b5b84ff95c - backtrace::backtrace::trace::h88dff4dc401d81d6
   1:     0x55b5b84ff992 - backtrace::capture::Backtrace::new::hc1bdbce336b16eca
   2:     0x55b5b799fb49 - panic_hook::panic_hook::ha4f6f84d07d9cbbd

Thread 'IO Worker #2' panicked at 'DB flush failed.: Error(Msg("Corruption: block checksum mismatch: expected 3482696050, got 3888739091  in parity/chains/ethereum/db/906a34e69aec8c0d/archive/db/011705.sst offset 35210665 size 16261"), State { next_error: None, backtrace: None })', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

    https://github.com/paritytech/parity/issues/new


====================

stack backtrace:

$ parity
Loading config file from /etc/parity/config.toml
2018-01-31 20:32:10  Starting Parity/v1.9.0-unstable-53ec114-20180125/x86_64-linux-gnu/rustc1.23.0
2018-01-31 20:32:10  Keys path parity/keys/Foundation
2018-01-31 20:32:10  DB path parity/chains/ethereum/db/906a34e69aec8c0d
2018-01-31 20:32:10  Path to dapps parity/dapps
2018-01-31 20:32:10  State DB configuration: archive +Fat +Trace
2018-01-31 20:32:10  Operating mode: active
2018-01-31 20:32:10  Configured for Foundation using Ethash engine
2018-01-31 20:32:10  Updated conversion rate to Ξ1 = US$1131.33 (105228024 wei/gas)

====================

stack backtrace:
   0:     0x5594fc78295c - backtrace::backtrace::trace::h88dff4dc401d81d6
   1:     0x5594fc782992 - backtrace::capture::Backtrace::new::hc1bdbce336b16eca
   2:     0x5594fbc22b49 - panic_hook::panic_hook::ha4f6f84d07d9cbbd

Thread 'main' panicked at 'failed to update version: Error(Msg("Corruption: block checksum mismatch: expected 3482696050, got 3888739091  inparity/chains/ethereum/db/906a34e69aec8c0d/archive/db/011705.sst offset 35210665 size 16261"), State { next_error: None, backtrace: None })', /checkout/src/libcore/result.rs:906

This is a bug. Please report it at:

    https://github.com/paritytech/parity/issues/new

$

(for the record, the lack of a second stack trace for the initial crash is not a mistake, there was no stack trace produced)

Again, going to reiterate, this is happening during a sync (full archive sync in my case, as seen in the output when restarting), and without any input at all. This is not the result of shutting down parity while it is syncing (inadvertently or otherwise). The corruption is happening during the sync process, which is causing parity to exit.

I'm capturing a full log right now, and will update this comment with it when it crashes.

As an aside... why does parity log to stderr?

[Edit] Here's a full log:
sync005.log

@andresilva
Copy link
Contributor

It is possible that not closing RocksDB properly on shutdown could lead to some silent corruption, if there's no crash on shutdown you'll only see that corruption whenever RocksDB has to write to that block in the future (which might be the case here). This is my best explanation so far, we have a fix for RocksDB not being properly closed on shutdown which will be out in the next release and I'd like to see if these corruption issues disappear or reduce in frequency.

@DeviateFish-2
Copy link

Why would that be the case here? The corruption is what's causing the shutdown of parity in these cases, not the other way around. Look at the logs: I'm not stopping parity and then encountering corruption on restart. Parity is crashing due to corruption.

I've done what I can do rule out hardware issues, but of course cannot completely rule them out. However, this seems to be a relatively frequent occurrence--#7334 was originally opened as a report of this behavior, and many of the issues closed as duplicates of it are also instances of crashes during initial sync.

These aren't cases where someone or something is forcibly terminating parity, and thus causing corruption due to an unclean shutdown. These are cases where parity itself is crashing, presumably due to corruption.

@andresilva
Copy link
Contributor

Maybe I didn't explain myself properly. Every single shutdown of parity until #7695 was an unclean shutdown, regardless of whether you would see a crash or not. RocksDB would not be properly closed. The error you're seeing doesn't mean the database is being corrupted during sync, it means you're finding corrupted data during the sync, the corruption could have happened at any other time.

I'm not saying that there isn't any other cause for the corruption, but this is currently my best explanation since this was a violation of the RocksDB API (not closing the database properly), and assuming you use the RocksDB API properly it shouldn't lead to data corruption (short of hardware faults or RocksDB bugs). If you're willing to help please do a db kill and update to 1.9.1 and report back if you find this issue again.

@5chdn 5chdn closed this as completed Feb 2, 2018
@DeviateFish-2
Copy link

DeviateFish-2 commented Feb 3, 2018

How would it have happened at "another time" if this is a fresh sync (e.g. empty parity data directory)?

You can look at the logs I've provided. Literally every one of these samples I've provided has been following a parity db kill + removing the cache and network folders.

I've said this in literally every report that this is a clean sync, from scratch, with no pre-existing data.

Please fucking read a little better.

After running a parity db kill (and removing the cache and network folders), I tried to sync again this morning:

I'm attempting to run a full archive sync from scratch, with transaction tracing enabled. Relevant section of config.toml that reflects the current setup:

@andresilva
Copy link
Contributor

@DeviateFish-2 Sorry, I wasn't aware of that, disregard what I said in that case.
Inside the db folder there should be a LOG file for RocksDB (chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/LOG or chains/ethereum/db/906a34e69aec8c0d/archive/db/LOG. This file is rewritten every time parity is started so could you share that LOG file right after you see a corruption crash? I'll try to raise the issue with RocksDB developers to see if they can point us to something. I haven't been able to reproduce this locally so it's hard for me to debug.

@DeviateFish-2
Copy link

Here's the LOG file (renamed so Github will accept it) associated with the above parity log (sync005):

rocksdb005.log

@5chdn Could you re-open this issue?

@5chdn 5chdn reopened this Feb 5, 2018
@5chdn 5chdn changed the title Corruption: block checksum mismatch Corruption: block checksum mismatch / during sync. Feb 5, 2018
@5chdn
Copy link
Contributor

5chdn commented Feb 5, 2018

Yep. Thanks for the logs.

@Emperornero
Copy link

Emperornero commented Feb 6, 2018

Is this issue being resolved anytime soon? I haven't been able to sync for MONTHS because of this issue and can confirm it's not a Hardware issue. I've had the same issue happen with 2 different SSDs and 6 different HDDs. Same problem no matter where the Parity database is stored.

Can provide more logs if needed.

@5chdn
Copy link
Contributor

5chdn commented Feb 7, 2018

@Emperornero which version? on start up or during sync?

@Emperornero
Copy link

This has been happening since 1.7.6, no DB clears seem to fix the problem, currently on 1.9.2.

@5chdn 5chdn mentioned this issue Feb 12, 2018
@DWAK-ATTK
Copy link

Ditto.

version Parity/v1.9.2-beta-0feb0bb-20180201/x86_64-linux-gnu/rustc1.23.0

Tried full sync of mainnet/foundation. It stalled out about 12 hours in (2.4m blocks). Issued clean shutdown (ctl-c). Shut the VM down until this morning.

Attempted restarting Parity this morning and received the same database corrupted database messages as everyone else.

parallels@ubuntu:~$ parity
2018-02-14 11:00:32  Starting Parity/v1.9.2-beta-0feb0bb-20180201/x86_64-linux-gnu/rustc1.23.0
2018-02-14 11:00:32  Keys path /home/parallels/.local/share/io.parity.ethereum/keys/Foundation
2018-02-14 11:00:32  DB path /home/parallels/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d
2018-02-14 11:00:32  Path to dapps /home/parallels/.local/share/io.parity.ethereum/dapps
2018-02-14 11:00:32  State DB configuration: fast
2018-02-14 11:00:32  Operating mode: active
2018-02-14 11:00:32  Configured for Foundation using Ethash engine
2018-02-14 11:00:32  DB corrupted: Invalid argument: You have to open all column families. Column families not opened: col4, col5, col6, col1, col3, col0, col2, attempting repair
2018-02-14 11:00:32  Updated conversion rate to Ξ1 = US$905.79 (131429600 wei/gas)
Client service error: Client(Database(Error(Msg("Received null column family handle from DB."), State { next_error: None, backtrace: None })))

@andresilva
Copy link
Contributor

andresilva commented Feb 15, 2018

I have created an issue in RocksDB with the logs that @DeviateFish-2 provided (facebook/rocksdb#3509).

@DeviateFish-2 I understand that you've tried to rule out hardware issues by switching hard drives and memory.

@Emperornero did you try to rule out faulty memory? Could you run a memtest?

@DWAK-ATTK
Copy link

DWAK-ATTK commented Feb 15, 2018

I don't know if it matters, but I'm running Parity in a Parallels 12 VM (Ubuntu 16.04) on a Macbook Pro running macOS 10.13.1

I've allocated 8GB ram to the VM (Parity appears to be a memory hog). With a 60GB vhd (on the laptop's internal SSD).

@5chdn 5chdn added Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate. and removed F2-bug 🐞 The client fails to follow expected behavior. P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Mar 23, 2018
@5chdn
Copy link
Contributor

5chdn commented Mar 23, 2018

Duplicate of #7748

@ghost
Copy link

ghost commented Nov 6, 2019

I had this same issue for days. I fixed the issue by removing 1 stick of my ram. Now my laptop has only 1 stick of 8GB DDR3 installed, and Parity syncs without an issue. Wish this helps!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
M4-core ⛓ Core client code / Rust. Z7-duplicate 🖨 Issue is a duplicate. Closer should comment with a link to the duplicate.
Projects
None yet
Development

No branches or pull requests

6 participants