Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix chain indexer reorg bug #19748

Merged
merged 2 commits into from
Jul 2, 2019

Conversation

zsfelfoldi
Copy link
Contributor

@zsfelfoldi zsfelfoldi commented Jun 20, 2019

This PR fixes a bug in the chain indexer that caused repeated "chain reorged during section processing" errors under some rare circumstances.
https://github.com/ethereum/go-ethereum/blob/master/core/chain_indexer.go#L400
Here processSection checks for each processed header whether is parentHash equals the last block hash. The parent of the first block in a section is compared against the last section head which is passed as a parameter.
https://github.com/ethereum/go-ethereum/blob/master/core/chain_indexer.go#L333
The last section head is passed here which is assumed to be canonical bacause a reorg should theoretically trigger a rollback of stored sections. The problem is that this is not guaranteed to happen before the new section processing begins. If there is a stored section whose head does not match the most recent canonical hash at this point then processSection will fail but the invalid last stored section is never rolled back so processSection will try and fail again and again, assuming a wrong lastHead and failing on the first parent check.
In this fix verifyLastHead() always verifies whether the last stored section is still canonical and rolls it back if necessary. This prevents being stuck in this invalid state and as an extra bonus it always makes sure that the result of Sections() is up to date.

Fixes
#15169
#17227
#19711
#19720

Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -358,6 +359,7 @@ func (c *ChainIndexer) updateLoop() {
} else {
// If processing failed, don't retry until further notification
c.log.Debug("Chain index processing failed", "section", section, "err", err)
c.verifyLastHead()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to put the statement here? Since we will verify the validity of stored sections before processing anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it does not matter really but still I would prefer not leaving the indexer in a bad state after detecting an error.

@rjl493456442
Copy link
Member

Could you please also fix the issue here https://github.com/ethereum/go-ethereum/blob/master/core/chain_indexer.go#L246 ?. The Head here is the number of a common ancestor which should be regarded valid.
Although this issue won't cause some bad errors since we will re-process the reverted sections anyway. But I think it would be nice to include it in this PR.

@karalabe karalabe added this to the 1.9.0 milestone Jun 27, 2019
@karalabe karalabe merged commit 32273df into ethereum:master Jul 2, 2019
@thomasmodeneis
Copy link
Contributor

thomasmodeneis commented Jul 30, 2019

The chain_indexer_test seems to be broken due to this changes, weird thing is that the test breaks only when executed alone, but works when running the complete core tests...

GOROOT=/usr/local/go
GOPATH=/opt/gocode
/usr/local/go/bin/go test -c -o /tmp/___chain_indexer_test_go github.com/ethereum/go-ethereum/core
/usr/local/go/bin/go tool test2json -t /tmp/___chain_indexer_test_go -test.v -test.run "^TestChainIndexerSingle|TestChainIndexerWithChildren$"
=== RUN   TestChainIndexerSingle
--- FAIL: TestChainIndexerSingle (5.79s)
    chain_indexer_test.go:155: Canonical section count mismatch: have 79, want 78
=== RUN   TestChainIndexerWithChildren
--- FAIL: TestChainIndexerWithChildren (5.77s)
    chain_indexer_test.go:155: Canonical section count mismatch: have 79, want 78
FAIL

Process finished with exit code 1

It seems that when reverting: (head + 1) to old statement head fixes the test:

GOROOT=/usr/local/go
GOPATH=/opt/gocode
/usr/local/go/bin/go test -c -o /tmp/___chain_indexer_test_go github.com/ethereum/go-ethereum/core
/usr/local/go/bin/go tool test2json -t /tmp/___chain_indexer_test_go -test.v -test.run "^TestChainIndexerSingle|TestChainIndexerWithChildren$"
=== RUN   TestChainIndexerSingle
--- PASS: TestChainIndexerSingle (2.53s)
=== RUN   TestChainIndexerWithChildren
--- PASS: TestChainIndexerWithChildren (10.61s)
PASS

Any ideas ?
source

@icodezjb
Copy link
Contributor

icodezjb commented Jan 6, 2020

The chain_indexer_test seems to be broken due to this changes, weird thing is that the test breaks only when executed alone, but works when running the complete core tests...

GOROOT=/usr/local/go
GOPATH=/opt/gocode
/usr/local/go/bin/go test -c -o /tmp/___chain_indexer_test_go github.com/ethereum/go-ethereum/core
/usr/local/go/bin/go tool test2json -t /tmp/___chain_indexer_test_go -test.v -test.run "^TestChainIndexerSingle|TestChainIndexerWithChildren$"
=== RUN   TestChainIndexerSingle
--- FAIL: TestChainIndexerSingle (5.79s)
    chain_indexer_test.go:155: Canonical section count mismatch: have 79, want 78
=== RUN   TestChainIndexerWithChildren
--- FAIL: TestChainIndexerWithChildren (5.77s)
    chain_indexer_test.go:155: Canonical section count mismatch: have 79, want 78
FAIL

Process finished with exit code 1

It seems that when reverting: (head + 1) to old statement head fixes the test:

GOROOT=/usr/local/go
GOPATH=/opt/gocode
/usr/local/go/bin/go test -c -o /tmp/___chain_indexer_test_go github.com/ethereum/go-ethereum/core
/usr/local/go/bin/go tool test2json -t /tmp/___chain_indexer_test_go -test.v -test.run "^TestChainIndexerSingle|TestChainIndexerWithChildren$"
=== RUN   TestChainIndexerSingle
--- PASS: TestChainIndexerSingle (2.53s)
=== RUN   TestChainIndexerWithChildren
--- PASS: TestChainIndexerWithChildren (10.61s)
PASS

Any ideas ?
source

we met the same issue, the chain_indexer_test failed #20497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants