-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oneeman/cherry pick discovery bootnode fix #1194
Conversation
celo-blockchain cherry-pick conflicts: cmd/devp2p/discv4cmd.go p2p/discover/common.go p2p/discover/v4_udp.go This adds an implementation of the current discovery v5 spec. There is full integration with cmd/devp2p and enode.Iterator in this version. In theory we could enable the new protocol as a replacement of discovery v4 at any time. In practice, there will likely be a few more changes to the spec and implementation before this can happen.
celo-blockchain cherry-pick conflicts: p2p/discover/v4_udp.go p2p/discover/v4_udp_test.go This moves all v4 protocol definitions to a new package, p2p/discover/v4wire. The new package will be used for low-level protocol tests.
…pty (#21396) This change improves discovery behavior in small networks. Very small networks would often fail to bootstrap because all member nodes were dropping table content due to findnode failure. The check is now changed to avoid dropping nodes on findnode failure when their bucket is almost empty. It also relaxes the liveness check requirement for FINDNODE/v4 response nodes, returning unverified nodes as results when there aren't any verified nodes yet. The "findnode failed" log now reports whether the node was dropped instead of the number of results. The value of the "results" was always zero by definition. Co-authored-by: Felix Lange <fjl@twurst.com>
If a node initially has trouble reaching the bootnode, the bootnode was being removed in doRevalidate(), leaving the table empty.
Seeing some lint failures: one about a file having to end with a newline, but others about unused private functions and a field in the code from upstream. Added the "do not merge" tag for now |
What's the cherry-picked commit from go-ethereum? |
The one with the fix is ethereum/go-ethereum#21396 . The other two I included because they were previous changes on the same code: the first was ethereum/go-ethereum#20750 and the second was ethereum/go-ethereum#21147 . You can see them above, with different hashes but the commit messages will be the same. |
Took care of the lint warnings and removed the "do not merge" label. |
Noticed an issue that sounds like it's due to the bugs fixed in this PR: #939. So I updated the PR description to say that it fixes that issue as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say i did a deep review. I don't have a deep knowledge of the discovery refactory, and i need to take a look at it first
Still i think this is good to merge.
My reasoning is:
- cherry picked commits are most probably valid. They come from already production code that has been running for a while on ethereum. Also, we haven't changed discovery, it's not likely they could break any of our code
- I've reviewed our commit on top, and seems reasonable, and quite a small case concerning a corner case.
So i'm happy with merging this.
The conflicts arose from #1194 having been on master, which includes an unpstream PR from 1.9.13 as well as an upstream PR from later on that was cherry-picked. Resolving them required checking the changes' context, but was then straightforward. Conflicts: cmd/devp2p/discv4cmd.go cmd/devp2p/discv5cmd.go p2p/discover/lookup.go p2p/discover/v4_udp.go
Description
This is a fix for ethereum/go-ethereum#20630. If a node initially fails to contract the bootnode for the purpose of discovery, there were two code paths which led to the bootnode being removed from the discovery table, leaving it empty and preventing further discovery. These are visible in the logs @tkporter included in that issue, where it says
Removed dead node
and where it saysToo many findnode failures, dropping
. There is an upstream PR, cherry-picked here, which fixes the second case by not dropping the node if the bucket it's in is less than half-full. For the second case, I added a comment on the go-ethereum issue about it to see whether a PR to fix it would be welcome and what logic it should use. In the meantime, I've included a commit in the PR to fix it, by not dropping the node if it's the only one in the bucket.Note that there is also a task called
doRefresh()
which runs every 30 minutes and re-adds the bootnodes to the discovery table, so the inability to discover other nodes would not have been permanentOther changes
I've also cherry-picked two prior commits from upstream to ease the merging by avoiding out-of-order commits on the same areas of the code.
Tested
Related issues
Backwards compatibility
The changes here are fixes and improvements to the discover package and do not include or assume protocol changes, so there shouldn't be any question of backwards incompatibility. Also, as mentioned in the testing section, this branch successfully discovers and syncs from mainnet peers using the mainnet bootnode.