-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geth halts on "fatal error: concurrent map iteration and map write" #16933
Comments
I can see that StateDB.Copy() is happening under the lock, but none other methods use that lock. Ideally, we should have a Read-Write lock to cover all internal data of StateDB |
Ok, I think I figured out where this race is coming from. Here: https://github.com/ethereum/go-ethereum/blob/v1.8.10/miner/worker.go#L499 the "work", containing pointer to the "current.state" gets pushed to the channel for the agents to process. Agent processes it (finds nonce), wraps it into a "Result" and puts it into the "recv" channel. Here we read it from "recv" channel: https://github.com/ethereum/go-ethereum/blob/v1.8.10/miner/worker.go#L301. Next, we call "WriteBlockWithState" here, passing the same state: https://github.com/ethereum/go-ethereum/blob/v1.8.10/miner/worker.go#L320. Inside WriteBlockWithState, we call state.Commit here: https://github.com/ethereum/go-ethereum/blob/v1.8.10/core/blockchain.go#L902, which mutates the state. When it happens, we are not holding the "currentMu" lock, which means the call to "pending" can proceed concurrently |
@epheph Could you experiment and surround the line https://github.com/ethereum/go-ethereum/blob/v1.8.10/miner/worker.go#L320 with "self.currentMu.Lock()" and "self.currentMu.UnLock()", to check if this is really the cause - looks like you can reproduce it quite frequently. If you confirm it, we can make a PR with the fix. Thank you very much! |
@AlexeyAkhunov thanks for the research and explanation! I'll give that modification a try |
@AlexeyAkhunov Alright, i'm 12 deploys in with that new code and haven't received the error, and without it, i received it on the 4th try. It's sporadic, but those results look very promising. The "Unlock" method has a lower L. Thanks for taking a look at this! |
@epheph Thanks for testing - I will prepare the PR in the next couple of days. Yes, "Unlock" has lower L :) |
@AlexeyAkhunov O(∩_∩)O~~ |
Same here! |
This appears to be closed, according to the merge-history |
System information
Geth version:
1.8.10-stable
OS & Version: Linux
Using hub.docker.com provided image
Expected behaviour
During our deployment process of "canned" (example) data, where we submit many transactions concurrently, I expect the transactions to be mined without halting geth. We submit a few hundred TX, in batches, and use a 1 second block time. None of these transactions are direct contract deployments, but contracts are deployed as a result of these transactions.
Actual behaviour
About 50% of the time, judging from my own experience, geth locks up due to "fatal error: concurrent map iteration and map write". Running the process over and over again, I can eventually get the deploy to succeed.
Steps to reproduce the behaviour
I have not been able to narrow down exactly the nature of this issue, only that it occurs while processing a large number of transactions. If you'd like to try it yourself, you can try our (docker-based) deployment process.
1.) Clone https://github.com/AugurProject/augur.js
2.)
npm install
3.)
npm run docker:build
The image it is based on,
augurproject/dev-node-geth
is just aFROM ethereum/client-go:v1.8.10
with a custom genesis block and entrypoint.Backtrace
The text was updated successfully, but these errors were encountered: