-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transports/tcp/: Call take_error on TCP socket #2458
Conversation
Within `Provider::new_stream` we wait for the socket to become writable (`stream.writable`), before returning it as a stream. In other words, we are waiting for the socket to connect before returning it as a new TCP connection. Waiting to connect before returning it as a new TCP connection allows us to catch TCP connection establishment errors early. While `stream.writable` drives the process of connecting, it does not surface potential connection errors themselves. These need to be explicitly collected via `TcpSocket::take_error`. If not explicitly collected, they will surface on future operations on the socket. For now this commit explicitly calls `TcpSocket::take_error` when using `async-io` only. `tokio` introduced the method (`take_error`) in tokio-rs/tokio#4364 though later reverted it in tokio-rs/tokio#4392. Once re-reverted, the same patch can be applied when using `libp2p-tcp` with tokio. --- One example on how this bug surfaces today: A `/dnsaddr/xxx` `Multiaddr` can potentially resolve to multiple IP addresses, e.g. to the IPv4 and the IPv6 addresses of a node. `libp2p-dns` tries dialing each of them in sequence using `libp2p-tcp`, returning the first that `libp2p-tcp` reports as successful. Say that the local node tries the IPv6 address first. In the scenario where the local node's networking stack does not support IPv6, e.g. has no IPv6 route, the connection attempt to the resolved IPv6 address of the remote node fails. Given that `libp2p-tcp` does not call `TcpSocket::take_error`, it would falsly report the TCP connection attempt as successful. `libp2p-dns` would receive the "successful" TCP connection for the IPv6 address from `libp2p-tcp` and would not attempt to dial the IPv4 address, even though it supports IPv4, and instead bubble up the "successful" IPv6 TCP connection. Only later, when writing or reading from the "successful" IPv6 TCP connection, would the IPv6 error surface.
Looks good to me! Do I understand correctly that a socket becomes writable when failing? (the man-page doesn’t explicitly say so, but it would be consistent with “ |
That is my understanding. |
@dignifiedquire could you test whether this patch fixes your issue? Either via # Inside the libpp2 lookup repo.
git checkout mxinden async-io-take-error # points at mxinden/rust-libp2p:async-io-take-error
cargo run -- direct --address /dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN Or via rust-libp2p: # Inside the rust-libp2p repo.
git checkout mxinden async-io-take-error
cargo run --example ipfs-kad |
@mxinden unfortunately the issue is still not fixed |
Co-authored-by: Oliver Wangler <oliver@wngr.de>
Discussed further out of band. I will move forward here, given that it at least solves parts of the core issue. Still more debugging to come. Will also create a reminder to do the same change on the tokio side once |
Within
Provider::new_stream
we wait for the socket to become writable(
stream.writable
), before returning it as a stream. In other words, weare waiting for the socket to connect before returning it as a new TCP
connection. Waiting to connect before returning it as a new TCP
connection allows us to catch TCP connection establishment errors early.
While
stream.writable
drives the process of connecting, it does notsurface potential connection errors themselves. These need to be
explicitly collected via
TcpSocket::take_error
. If not explicitlycollected, they will surface on future operations on the socket.
For now this commit explicitly calls
TcpSocket::take_error
when usingasync-io
only.tokio
introduced the method (take_error
) intokio-rs/tokio#4364 though later reverted it in
tokio-rs/tokio#4392. Once re-reverted, the same
patch can be applied when using
libp2p-tcp
with tokio.One example on how this bug surfaces today:
A
/dnsaddr/xxx
Multiaddr
can potentially resolve to multiple IPaddresses, e.g. to the IPv4 and the IPv6 addresses of a node.
libp2p-dns
tries dialing each of them in sequence usinglibp2p-tcp
,returning the first that
libp2p-tcp
reports as successful.Say that the local node tries the IPv6 address first. In the scenario
where the local node's networking stack does not support IPv6, e.g. has
no IPv6 route, the connection attempt to the resolved IPv6 address of
the remote node fails. Given that
libp2p-tcp
does not callTcpSocket::take_error
, it would falsly report the TCP connectionattempt as successful.
libp2p-dns
would receive the "successful" TCPconnection for the IPv6 address from
libp2p-tcp
and would not attemptto dial the IPv4 address, even though it supports IPv4, and instead
bubble up the "successful" IPv6 TCP connection. Only later, when writing
or reading from the "successful" IPv6 TCP connection, would the IPv6
error surface.
Thanks @dignifiedquire for the initial bug report as well as the many debugging infos.