core: fix import errors on clique crashes + empty blocks #19544

karalabe · 2019-05-09T12:07:41Z

This PR fixes an issue during reimporting old blocks on a Clique network after a node crash.

The root of the issue is that Clique doesn't have block subsidy, so it can happen that two consecutive blocks have the same state (e.g. no transactions, self-transactions with 0 gas price, self-transactions by the miner).

If the node crashes (we lose the recent state), Geth will rewind the head block to the first one that we do have the state for. From that point onward, we try to reprocess all the blocks. If however two previously known blocks have the same state, processing the first one will also complete the second. Currently the block importer rejects the second with a "known block" error, since it doesn't expect this scenario (it rewound because no state was present, how did the state reappear out of the blue?).

This PR adds a new clause to the block importer, so that instead of rejecting an already known block, we simply ignore it and proceed to the next one. Although the code seems simple, we should try to ensure that nothing breaks.

Q: Until now we rejected known blocks in such cases. Is it a problem that we do not any more?
A: Although we did reject a known block previously, it just failed the sync, we restarted it, and the restart completely skipped the block (since it was known). As such, ignoring it instead of temporarily rejecting seems fine.

Test output for inspection:

=== RUN   TestReimportMirroredState
INFO [05-10|15:39:29.564] Persisted trie from memory database      nodes=1 size=140.00B time=8.789µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.572] Loaded most recent local header          number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.572] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.572] Loaded most recent local fast block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
DEBUG[05-10|15:39:29.572] Persisted trie from memory database      nodes=1 size=140.00B time=4.573µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
DEBUG[05-10|15:39:29.572] Persisted trie from memory database      nodes=0 size=0.00B   time=926ns   gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
DEBUG[05-10|15:39:29.573] Persisted trie from memory database      nodes=1 size=140.00B time=3.207µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.573] Persisted trie from memory database      nodes=1 size=140.00B time=3µs     gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.581] Loaded most recent local header          number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.581] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.581] Loaded most recent local fast block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.582] Stored checkpoint snapshot to disk       number=0 hash=7b72ae…73d63e
DEBUG[05-10|15:39:29.583] Inserted new block                       number=1 hash=9fa29f…98fa04 uncles=0 txs=1 gas=21000 elapsed=432.077µs root=9f1331…374441
DEBUG[05-10|15:39:29.583] Inserted new block                       number=2 hash=66a17b…b3a422 uncles=0 txs=0 gas=0     elapsed=41.416µs  root=9f1331…374441
INFO [05-10|15:39:29.583] Imported new chain segment               blocks=2 txs=1 mgas=0.021 elapsed=1.279ms   mgasps=16.411 number=2 hash=66a17b…b3a422 age=50y3w5d dirty=236.00B
WARN [05-10|15:39:29.589] Head state missing, repairing chain      number=2 hash=66a17b…b3a422
INFO [05-10|15:39:29.589] Rewound blockchain to past state         number=0 hash=7b72ae…73d63e
INFO [05-10|15:39:29.589] Loaded most recent local header          number=2 hash=66a17b…b3a422 td=4 age=50y3w5d
INFO [05-10|15:39:29.589] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.589] Loaded most recent local fast block      number=2 hash=66a17b…b3a422 td=4 age=50y3w5d
DEBUG[05-10|15:39:29.590] Pruned ancestor, inserting as sidechain  number=3 hash=76a91d…17cfbd
DEBUG[05-10|15:39:29.590] Injected sidechain block                 number=3 hash=76a91d…17cfbd diff=2 elapsed=29.853µs  txs=1 gas=21000 uncles=0 root=2806f8…f63b47
INFO [05-10|15:39:29.590] Importing sidechain segment              start=1 end=3
DEBUG[05-10|15:39:29.591] Inserted new block                       number=1 hash=9fa29f…98fa04 uncles=0 txs=1 gas=21000 elapsed=391.125µs root=9f1331…374441
DEBUG[05-10|15:39:29.591] Inserted known block                     number=2 hash=66a17b…b3a422 uncles=0 txs=0 gas=0     root=9f1331…374441
DEBUG[05-10|15:39:29.591] Inserted new block                       number=3 hash=76a91d…17cfbd uncles=0 txs=1 gas=21000 elapsed=338.405µs root=2806f8…f63b47
INFO [05-10|15:39:29.591] Imported new chain segment               blocks=3 txs=2 mgas=0.042 elapsed=928.437µs mgasps=45.237 number=3 hash=76a91d…17cfbd age=50y3w5d dirty=472.00B

Fixes: #19360, #19302, #19258.

rjl493456442 · 2019-05-09T12:22:25Z

core/blockchain.go

+			}
+			stats.processed++
+
+			// TODO(karalabe): Can we assume canonicalness here? Can we assume no logs?


The logs are annoying here. Essentially we should fire new logs events if exists. But known block insertion is mainly caused by Rollback and Re-import. In the Rollback function, we didn't fire some events to notify users these logs are removed, so if we fire these logs again, probably users can receive 2 notifications.

Perhaps we can judge whether it's a canonical one by total difficulty comparison.

I guess if we're importing a long sidechain (known blocks can only occur with side chains, or lost canons), then we can assume that the TD beat our canon TD, so we can accumulate as canonical.

rjl493456442 · 2019-05-09T12:23:40Z

core/blockchain.go

@@ -1190,15 +1193,16 @@ func (bc *BlockChain) insertChain(chain types.Blocks, verifySeals bool) (int, []
 		}


Can we also do events = append(events, ChainEvent{block, block.Hash(), nil}) for prefix known blocks?

Yeah, my reluctance around these is that I'm not sure if we're supposed to fire canon or side things? Are we sure it's always canon? If so we can definitely do.

But how do we fire logs?

In this case, known blocks are canonical since the externTd > localTd

I'm not really sure we should announce: if we get 10 new blocks from the network, and we in the mean time imported the first 2, we should not double announce the chain events.

holiman · 2019-05-09T14:03:46Z

same state (e.g. no transactions, self-transactions with 0 gas price, self-transactions by the miner).

Nitpick: it must be no transactions, since any tx inevitably changes a nonce at sender account.

karalabe · 2019-05-10T13:14:21Z

@holiman @rjl493456442 I think I've addressed the review questions now and also added the missing repro test with the log output. PTAL

core/blockchain.go

holiman

A minor nitpick to help our future selves, otherwise LGTM

karalabe · 2019-05-10T13:37:45Z

@holiman Addressed the nit.

zulhfreelancer · 2019-05-11T03:25:59Z

@karalabe will this be included in 1.9?

karalabe · 2019-05-13T09:59:21Z

@zulhfreelancer yes

this addresses clique non-archive node re-syncing state after a non-graceful shutdown: ethereum#19838 code is borrowed from: ethereum#19544

core: fix import errors on clique crashes + empty blocks

feb8029

karalabe added this to the 1.9.0 milestone May 9, 2019

karalabe requested review from holiman and rjl493456442 May 9, 2019 12:07

This was referenced May 9, 2019

Impossible reorg, please file an issue #19360

Closed

Impossible reorg, please file an issue #19302

Closed

Synchronisation failed, dropping peer valid and Impossible reorg #19258

Closed

rjl493456442 reviewed May 9, 2019

View reviewed changes

karalabe added 2 commits May 10, 2019 15:43

cosensus/clique, core: add test for the mirrored state issue

72e41e6

core: address todo question wrt log count

69149ca

holiman reviewed May 10, 2019

View reviewed changes

core/blockchain.go Outdated Show resolved Hide resolved

holiman approved these changes May 10, 2019

View reviewed changes

rjl493456442 approved these changes May 10, 2019

View reviewed changes

core: raise a louder warning for non-clique known blocks

2d0b6e7

karalabe merged commit 6ec6b29 into ethereum:master May 10, 2019

rjl493456442 mentioned this pull request May 13, 2019

Geth synchronization failed. Impossible reorg, please file an issue #19561

Closed

vdamle mentioned this pull request Jul 17, 2019

eth/downloader: peers dropped during sync after non-graceful restart #19838

Closed

vdamle mentioned this pull request Jul 18, 2019

core: fix import errors on clique crashes + empty blocks #19862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: fix import errors on clique crashes + empty blocks #19544

core: fix import errors on clique crashes + empty blocks #19544

karalabe commented May 9, 2019 •

edited

Loading

rjl493456442 May 9, 2019

rjl493456442 May 9, 2019 •

edited

Loading

karalabe May 10, 2019

rjl493456442 May 9, 2019

karalabe May 9, 2019

rjl493456442 May 9, 2019

karalabe May 10, 2019

holiman commented May 9, 2019 via email •

edited

Loading

karalabe commented May 10, 2019

holiman left a comment

karalabe commented May 10, 2019

zulhfreelancer commented May 11, 2019

karalabe commented May 13, 2019

		@@ -1190,15 +1193,16 @@ func (bc *BlockChain) insertChain(chain types.Blocks, verifySeals bool) (int, []
		}

core: fix import errors on clique crashes + empty blocks #19544

core: fix import errors on clique crashes + empty blocks #19544

Conversation

karalabe commented May 9, 2019 • edited Loading

rjl493456442 May 9, 2019

Choose a reason for hiding this comment

rjl493456442 May 9, 2019 • edited Loading

Choose a reason for hiding this comment

karalabe May 10, 2019

Choose a reason for hiding this comment

rjl493456442 May 9, 2019

Choose a reason for hiding this comment

karalabe May 9, 2019

Choose a reason for hiding this comment

rjl493456442 May 9, 2019

Choose a reason for hiding this comment

karalabe May 10, 2019

Choose a reason for hiding this comment

holiman commented May 9, 2019 via email • edited Loading

karalabe commented May 10, 2019

holiman left a comment

Choose a reason for hiding this comment

karalabe commented May 10, 2019

zulhfreelancer commented May 11, 2019

karalabe commented May 13, 2019

karalabe commented May 9, 2019 •

edited

Loading

rjl493456442 May 9, 2019 •

edited

Loading

holiman commented May 9, 2019 via email •

edited

Loading