Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky examples/relay tests #1158

Closed
marten-seemann opened this issue Aug 16, 2021 · 5 comments · Fixed by #1219
Closed

flaky examples/relay tests #1158

marten-seemann opened this issue Aug 16, 2021 · 5 comments · Fixed by #1219
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@marten-seemann
Copy link
Contributor

 === RUN   TestMain
      logharness.go:101: saw: Okay, no connection from h1 to h3: no addresses
      logharness.go:106: did not see expected prefix "Meow! It worked!"
  --- FAIL: TestMain (4.54s)
@marten-seemann marten-seemann added the kind/bug A bug in existing code (including security flaws) label Aug 16, 2021
@Stebalien Stebalien changed the title flaky TestMain flaky examples/relay tests Aug 17, 2021
@marten-seemann
Copy link
Contributor Author

The problem here is the following:
h1 and h3 connect to h2 (the relay). As soon as both of them have established a connection via to the relay, they try to establish a connection between each other via the relay. This is racy, as both h1 and h3 might dial using multiple transports, (temporarily) resulting in the establishment of multiple connection. Spurious connections will then be closed by the respective hosts. Now it might happen that the connection attempt between h1 and h3 happens before the relay learns which connection is closed.

@Stebalien
Copy link
Member

Really? IIRC, we don't close these connections anymore.

@Stebalien
Copy link
Member

Or is this because we're canceling outbound dials once one dial succeeds? I believe we can fix this in

case <-h.ids.IdentifyWait(s.Conn()):
by trying to create a new stream if identify fails... maybe?

@marten-seemann
Copy link
Contributor Author

marten-seemann commented Oct 5, 2021

Or is this because we're canceling outbound dials once one dial succeeds?

Yes: https://github.com/libp2p/go-libp2p-swarm/blob/88ef86a16cf0bbc451c3799f63278d345aaf56ea/limiter.go#L227-L234

by trying to create a new stream if identify fails... maybe?

Not sure if I understand your proposal. h2 is creating a stream to h3 (for relaying the connection from h1 to h3). Where does identify come into play here?

@Stebalien
Copy link
Member

Well, it's kind of a hack, but it's not too terrible. Basically, if peer A successfully "identifies" peer B, peer B must have yielded the connection to the host and therefore won't cancel it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants