Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix import errors on clique crashes + empty blocks #19544

Merged
merged 4 commits into from
May 10, 2019

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented May 9, 2019

This PR fixes an issue during reimporting old blocks on a Clique network after a node crash.

The root of the issue is that Clique doesn't have block subsidy, so it can happen that two consecutive blocks have the same state (e.g. no transactions, self-transactions with 0 gas price, self-transactions by the miner).

If the node crashes (we lose the recent state), Geth will rewind the head block to the first one that we do have the state for. From that point onward, we try to reprocess all the blocks. If however two previously known blocks have the same state, processing the first one will also complete the second. Currently the block importer rejects the second with a "known block" error, since it doesn't expect this scenario (it rewound because no state was present, how did the state reappear out of the blue?).

This PR adds a new clause to the block importer, so that instead of rejecting an already known block, we simply ignore it and proceed to the next one. Although the code seems simple, we should try to ensure that nothing breaks.

Q: Until now we rejected known blocks in such cases. Is it a problem that we do not any more?
A: Although we did reject a known block previously, it just failed the sync, we restarted it, and the restart completely skipped the block (since it was known). As such, ignoring it instead of temporarily rejecting seems fine.

Test output for inspection:

=== RUN   TestReimportMirroredState
INFO [05-10|15:39:29.564] Persisted trie from memory database      nodes=1 size=140.00B time=8.789µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.572] Loaded most recent local header          number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.572] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.572] Loaded most recent local fast block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
DEBUG[05-10|15:39:29.572] Persisted trie from memory database      nodes=1 size=140.00B time=4.573µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
DEBUG[05-10|15:39:29.572] Persisted trie from memory database      nodes=0 size=0.00B   time=926ns   gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
DEBUG[05-10|15:39:29.573] Persisted trie from memory database      nodes=1 size=140.00B time=3.207µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.573] Persisted trie from memory database      nodes=1 size=140.00B time=3µs     gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [05-10|15:39:29.581] Loaded most recent local header          number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.581] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.581] Loaded most recent local fast block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.582] Stored checkpoint snapshot to disk       number=0 hash=7b72ae…73d63e
DEBUG[05-10|15:39:29.583] Inserted new block                       number=1 hash=9fa29f…98fa04 uncles=0 txs=1 gas=21000 elapsed=432.077µs root=9f1331…374441
DEBUG[05-10|15:39:29.583] Inserted new block                       number=2 hash=66a17b…b3a422 uncles=0 txs=0 gas=0     elapsed=41.416µs  root=9f1331…374441
INFO [05-10|15:39:29.583] Imported new chain segment               blocks=2 txs=1 mgas=0.021 elapsed=1.279ms   mgasps=16.411 number=2 hash=66a17b…b3a422 age=50y3w5d dirty=236.00B
WARN [05-10|15:39:29.589] Head state missing, repairing chain      number=2 hash=66a17b…b3a422
INFO [05-10|15:39:29.589] Rewound blockchain to past state         number=0 hash=7b72ae…73d63e
INFO [05-10|15:39:29.589] Loaded most recent local header          number=2 hash=66a17b…b3a422 td=4 age=50y3w5d
INFO [05-10|15:39:29.589] Loaded most recent local full block      number=0 hash=7b72ae…73d63e td=0 age=50y3w5d
INFO [05-10|15:39:29.589] Loaded most recent local fast block      number=2 hash=66a17b…b3a422 td=4 age=50y3w5d
DEBUG[05-10|15:39:29.590] Pruned ancestor, inserting as sidechain  number=3 hash=76a91d…17cfbd
DEBUG[05-10|15:39:29.590] Injected sidechain block                 number=3 hash=76a91d…17cfbd diff=2 elapsed=29.853µs  txs=1 gas=21000 uncles=0 root=2806f8…f63b47
INFO [05-10|15:39:29.590] Importing sidechain segment              start=1 end=3
DEBUG[05-10|15:39:29.591] Inserted new block                       number=1 hash=9fa29f…98fa04 uncles=0 txs=1 gas=21000 elapsed=391.125µs root=9f1331…374441
DEBUG[05-10|15:39:29.591] Inserted known block                     number=2 hash=66a17b…b3a422 uncles=0 txs=0 gas=0     root=9f1331…374441
DEBUG[05-10|15:39:29.591] Inserted new block                       number=3 hash=76a91d…17cfbd uncles=0 txs=1 gas=21000 elapsed=338.405µs root=2806f8…f63b47
INFO [05-10|15:39:29.591] Imported new chain segment               blocks=3 txs=2 mgas=0.042 elapsed=928.437µs mgasps=45.237 number=3 hash=76a91d…17cfbd age=50y3w5d dirty=472.00B

Fixes: #19360, #19302, #19258.

}
stats.processed++

// TODO(karalabe): Can we assume canonicalness here? Can we assume no logs?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs are annoying here. Essentially we should fire new logs events if exists. But known block insertion is mainly caused by Rollback and Re-import. In the Rollback function, we didn't fire some events to notify users these logs are removed, so if we fire these logs again, probably users can receive 2 notifications.

Copy link
Member

@rjl493456442 rjl493456442 May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can judge whether it's a canonical one by total difficulty comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we're importing a long sidechain (known blocks can only occur with side chains, or lost canons), then we can assume that the TD beat our canon TD, so we can accumulate as canonical.

@@ -1190,15 +1193,16 @@ func (bc *BlockChain) insertChain(chain types.Blocks, verifySeals bool) (int, []
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also do events = append(events, ChainEvent{block, block.Hash(), nil}) for prefix known blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my reluctance around these is that I'm not sure if we're supposed to fire canon or side things? Are we sure it's always canon? If so we can definitely do.

But how do we fire logs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, known blocks are canonical since the externTd > localTd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure we should announce: if we get 10 new blocks from the network, and we in the mean time imported the first 2, we should not double announce the chain events.

@holiman
Copy link
Contributor

holiman commented May 9, 2019 via email

@karalabe
Copy link
Member Author

@holiman @rjl493456442 I think I've addressed the review questions now and also added the missing repro test with the log output. PTAL

core/blockchain.go Outdated Show resolved Hide resolved
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor nitpick to help our future selves, otherwise LGTM

@karalabe
Copy link
Member Author

@holiman Addressed the nit.

@karalabe karalabe merged commit 6ec6b29 into ethereum:master May 10, 2019
@zulhfreelancer
Copy link

@karalabe will this be included in 1.9?

@karalabe
Copy link
Member Author

@zulhfreelancer yes

vdamle pushed a commit to vdamle/go-ethereum that referenced this pull request Jul 18, 2019
this addresses clique non-archive node re-syncing state after a
non-graceful shutdown: ethereum#19838

code is borrowed from: ethereum#19544
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible reorg, please file an issue
4 participants