-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: relay v2 discovery (go-libp2p v0.19.0) #8868
Conversation
622ea1d
to
e395f42
Compare
187121e
to
747c114
Compare
Note to self:
is expected to produce
but with go-libp2p 0.19 it gets
because |
2ec71d1
to
5c1a1a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased, CI is green, if we want we could merge this for 0.13-rc1 as-is.
Caveat: test/sharness/t0182-circuit-relay.sh
became flaky – could be a bug? but we could tackle it after RC1 if needed – see comment below.
e0ddc66
to
f27dc6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is ready for final review/merging.
Added Peering.Peers
and basic exponential back off.
Should be good enough for go-ipfs 0.13-rc1 – see key comments below.
// Feed peers more often right after the bootstrap, then backoff | ||
bo := backoff.NewExponentialBackOff() | ||
bo.InitialInterval = 15 * time.Second | ||
bo.Multiplier = 3 | ||
bo.MaxInterval = 1 * time.Hour | ||
bo.MaxElapsedTime = 0 // never stop | ||
t := backoff.NewTicker(bo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ This will feed peers to AutoRelay every 15s, and then exponentially back off until it happens once an hour.
Should be okay solution until we have better mechanism in future go-libp2p 0.20 (where libp2p asks for peers, instead of being fed)
I used github.com/cenkalti/backoff/v4
because it was already an indirect dependency, but lmk if I should just write backoff by hand.
// Always feed trusted IDs (Peering.Peers in the config) | ||
for _, trustedPeer := range cfgPeering.Peers { | ||
if len(trustedPeer.Addrs) == 0 { | ||
continue | ||
} | ||
select { | ||
case peerChan <- trustedPeer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ this allows people to reuse already trusted peers from Peering.Peers
in addition to peers discovered via dht.WAN.GetClosestPeers
test_expect_success 'wait until relay is ready to do work' ' | ||
sleep 1 | ||
while ! ipfsi 2 swarm connect /p2p/$PEERID_1/p2p-circuit/p2p/$PEERID_0; do | ||
iptb stop && | ||
iptb_wait_stop && | ||
iptb start -wait -- --routing=none && | ||
iptb connect 0 1 && | ||
iptb connect 2 1 && | ||
sleep 5 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ below is not a blocker, imo this PR can be merged and if this is a real problem we will figure it out with RC1 being used in the real world.
go-ipfs/test/sharness/t0182-circuit-relay.sh
became flaky.
ipfsi 2 swarm connect /p2p/$PEERID_1/p2p-circuit/p2p/$PEERID_0
sometimes fails with:
NO_RESERVATION (204)
Error: connect 12D3KooWLuY1Q13o6Q91NoJn4Qv7RSq5rYS6uC4YCzKCJ3S8GnwQ failure: failed to dial 12D3KooWLuY1Q13o6Q91NoJn4Qv7RSq5rYS6uC4YCzKCJ3S8GnwQ:
* [/p2p/12D3KooWNfs8uFpQf6NnseZKLyS3c8EFppGtUGejHpViBeg8f7Vt/p2p-circuit] error opening relay circuit: NO_RESERVATION (204)`
- it happens randomly, bumping it to 10 seconds does not help much, it still fails sometimes.
- shutting down nodes and starting them again fixes the problem
- ...but it feels like covering up some underlying racy bug: once
NO_RESERVATION
error is returned, the relay will never work, no matter how long we wait and retry again (only way to fix it is to reboot of the relay node / testbed).
- ...but it feels like covering up some underlying racy bug: once
Repro (runs tests until they error):
$ killall ipfs ; i=0; while ./t0182-circuit-relay.sh -v; do echo " -----> run no. $i <----"; sleep 1; ((i=i+1)) ; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @marten-seemann for visibility – lmk if I should fill a bug in go-libp2p, or how I can provide more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigation continues in #8967
defer func() { | ||
if r := recover(); r != nil { | ||
fmt.Println("Recovering from unexpected error in AutoRelayFeeder:", r) | ||
debug.PrintStack() | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got paranoid because of panics around go-libp2p 0.19 (e.g. libp2p/go-libp2p#1467), and since this is bolt-on temporary feeder (that may get refactored multiple times) felt prudent to decrease blast radius.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like overkill but okay :)
} | ||
closestPeers, err := dht.WAN.GetClosestPeers(ctx, h.ID().String()) | ||
if err != nil { | ||
// no-op: usually 'failed to find any peer in table' during startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log this at debug level? Or check if it's kbucket.ErrLookupFailure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo not worth it, because this entire func will be redone after go-libp2p 0.20 ships.
Old tests were no longer working because go-libp2p 0.19 removed the undocumented 'ls' pseudoprotocol. This replaces these tests with handshake attempt (name is echoed back on OK or 'na' is returned when protocol is not available) for tls and noise variants + adds explicit test that safeguards us against enabling plaintext by default by a mistake.
test is flaky, for now we just restart the testbed when we get NO_RESERVATION error
switched to commit which landed in go-namesys master
It starts at feeding peers ever 15s, then backs off each time until it is done once an hour Should be acceptable until we have smarter mechanism in go-lib2p 0.20
This ensures we feed trusted Peering.Peers in addition to any peers discovered over DHT.
253bd06
to
61560ae
Compare
Rebased to include updated resource manager 0.3.0 from #8901, waiting for CI. |
Removes fixup introduced in #8868 (comment) so we can dig into the underlying cause
Removes fixup introduced in #8868 (comment) so we can dig into the underlying cause
Removes fixup introduced in #8868 (comment) so we can dig into the underlying cause
Release Notes: https://github.com/libp2p/go-libp2p/releases/tag/v0.19.0
go-libp2p v0.19.0 implements circuit v2 relay discovery: AutoRelays now reads peers from a channel (provided by autorelay.WithPeerSource), tests if those peers speak the relay v2 protocol and tries to obtain a reservation with these nodes.
This PR gets the closest peers from the DHT, and provides them to AutoRelay on a regular basis.
By default, AutoRelay does not use Circuit Relay v1 nodes any more. Circuit v1 support is only enabled when static relays are in use.
TODO
Peering.Peers
in autorelay discovery/plaintext/2.0.0
when--disable-transport-encryption
is passed (failing socatt test int0061-daemon-opts.sh
)t0160-resolve.sh
testBREAKING CHANGES
See
CHANGELOG.md