-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libp2p quic second attempt #2159
Conversation
Thanks @dvc94ch for giving this another shot. As someone not directly involved in the many Rust QUIC efforts, could you compare this implementation on a high level to the many unfinished efforts out there, namely:
//CC @kpp who might be able to add some additional context as well. |
well first demie wrote an implementation. then tomaka improved on it. it stalled mainly because of rustls/quinn-proto not having a new release for over a year. this implementation builds on previous work and resolves the open todo's. the tls implementation was updated to the latest spec and was tested with other implementations by kpp (correctness). features it has that the old version didn't have are proper udp socket handling with ecn codes and gso (throughput). support for plugging in your own crypto (security). support for 0rtt encryption (latency). key logging to help debugging with wireshark (maintainability). other than correctness, throughput, latency, security and maintainability not really sure how a generic connection oriented transport can differentiate itself. |
@tomaka ^ |
anyone interested in performing a first round of review? |
@kpp if parity is interested in getting this merged, maybe you could give it a review? |
Well, my issue is to finish @tomaka’s attempt. |
Why do you think that is a better starting point? |
Are you referring to https://github.com/kpp/rust-libp2p/tree/master/transports/tls-quic/src? It seems incomplete to me but the tls code does look quite neat. |
That's an issue I was assigned.
Thank you! I will merge Pierre's code from https://github.com/tomaka/libp2p-rs/tree/quiccc-again/transports/quic into it. |
So parity decided to finish it's branch now. How convenient. I assume that the most contentious part is that there is noise support in this branch and you probably think it'll make your code much simpler to support just TLS. It doesn't make much difference as you saw in the crypto trait. Although there isn't support for non ed25519 keys, it's probably not that hard to add. Even in the quiccc branch there are lots of open todos and missing features. Are you planning on fixing those? |
In that case I'll continue maintaining libp2p-quic. I assume @mxinden will wait a few months to see what happens before deciding anything.
Let's hope your branch doesn't get abandonned again in 6 months. |
My (and Parity's) position has never ever changed: we want QUIC+TLS, not QUIC+Noise, and the reason the branch was stalled is that either nobody is working on it or (for the longest time) cross-dependencies issues w.r.t. |
This branch supports TLS too, although @kpp seems to have done some improvements and cleanup. |
I talked to Pierre. Let's unite on your PR. |
so last missing thing is merging master and getting the build green. great work @kpp and @Demi-Marie |
* protocols/gossipsub: Fix inconsistency in mesh peer tracking (libp2p#2189) Co-authored-by: Age Manning <Age@AgeManning.com> * misc/metrics: Add auxiliary crate to record events as OpenMetrics (libp2p#2063) This commit adds an auxiliary crate recording protocol and Swarm events and exposing them as metrics in the OpenMetrics format. * README: Mention security@ipfs.io * examples/: Add file sharing example (libp2p#2186) Basic file sharing application with peers either providing or locating and getting files by name. While obviously showcasing how to build a basic file sharing application, the actual goal of this example is **to show how to integrate rust-libp2p into a larger application**. Architectural properties - Clean clonable async/await interface ([`Client`]) to interact with the network layer. - Single task driving the network layer, no locks required. * examples/README: Give an overview over the many examples (libp2p#2194) * protocols/kad: Enable filtering of (provider) records (libp2p#2163) Introduce `KademliaStoreInserts` option, which allows to filter records. Co-authored-by: Max Inden <mail@max-inden.de> * swarm/src/protocols_handler: Impl ProtocolsHandler on either::Either (libp2p#2192) Implement ProtocolsHandler on either::Either representing either of two ProtocolsHandler implementations. Co-authored-by: Thomas Eizinger <thomas@eizinger.io> * *: Make libp2p-core default features optional (libp2p#2181) Co-authored-by: Max Inden <mail@max-inden.de> * core/: Remove DisconnectedPeer::set_connected and Pool::add (libp2p#2195) This logic seems to be a leftover of libp2p#889 and unused today. * protocols/gossipsub: Use ed25519 in tests (libp2p#2197) With f2905c0 the secp256k1 feature is disabled by default. Instead of enabling it in the dev-dependency, simply use ed25519. * build(deps): Update minicbor requirement from 0.10 to 0.11 (libp2p#2200) Updates the requirements on [minicbor](https://gitlab.com/twittner/minicbor) to permit the latest version. - [Release notes](https://gitlab.com/twittner/minicbor/tags) - [Changelog](https://gitlab.com/twittner/minicbor/blob/master/CHANGELOG.md) - [Commits](https://gitlab.com/twittner/minicbor/compare/minicbor-v0.10.0...minicbor-v0.11.0) --- updated-dependencies: - dependency-name: minicbor dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): Update salsa20 requirement from 0.8 to 0.9 (libp2p#2206) * build(deps): Update salsa20 requirement from 0.8 to 0.9 Updates the requirements on [salsa20](https://github.com/RustCrypto/stream-ciphers) to permit the latest version. - [Release notes](https://github.com/RustCrypto/stream-ciphers/releases) - [Commits](RustCrypto/stream-ciphers@ctr-v0.8.0...salsa20-v0.9.0) --- updated-dependencies: - dependency-name: salsa20 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * *: Bump pnet to v0.22 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Max Inden <mail@max-inden.de> * *: Dial with handler and return handler on error and closed (libp2p#2191) Require `NetworkBehaviourAction::{DialPeer,DialAddress}` to contain a `ProtocolsHandler`. This allows a behaviour to attach custom state to its handler. The behaviour would no longer need to track this state separately during connection establishment, thus reducing state required in a behaviour. E.g. in the case of `libp2p-kad` the behaviour can include a `GetRecord` request in its handler, or e.g. in the case of `libp2p-request-response` the behaviour can include the first request in the handler. Return `ProtocolsHandler` on connection error and close. This allows a behaviour to extract its custom state previously included in the handler on connection failure and connection closing. E.g. in the case of `libp2p-kad` the behaviour could extract the attached `GetRecord` from the handler of the failed connection and then start another connection attempt with a new handler with the same `GetRecord` or bubble up an error to the user. Co-authored-by: Thomas Eizinger <thomas@eizinger.io> * core/: Remove deprecated read/write functions (libp2p#2213) Co-authored-by: Max Inden <mail@max-inden.de> * protocols/ping: Revise naming of symbols (libp2p#2215) Co-authored-by: Max Inden <mail@max-inden.de> * protocols/rendezvous: Implement protocol (libp2p#2107) Implement the libp2p rendezvous protocol. > A lightweight mechanism for generalized peer discovery. It can be used for bootstrap purposes, real time peer discovery, application specific routing, and so on. Co-authored-by: rishflab <rishflab@hotmail.com> Co-authored-by: Daniel Karzel <daniel@comit.network> * core/src/network/event.rs: Fix typo (libp2p#2218) * protocols/mdns: Do not fire all timers at the same time. (libp2p#2212) Co-authored-by: Max Inden <mail@max-inden.de> * misc/metrics/src/kad: Set query_duration lowest bucket to 0.1 sec (libp2p#2219) Probability for a Kademlia query to return in less than 100 milliseconds is low, thus increasing the lower bucket to improve accuracy within the higher ranges. * misc/metrics/src/swarm: Expose role on connections_closed (libp2p#2220) Expose whether closed connection was a Dialer or Listener. * .github/workflows/ci.yml: Use clang 11 (libp2p#2233) * protocols/rendezvous: Update prost (libp2p#2226) Co-authored-by: Max Inden <mail@max-inden.de> * *: Fix clippy warnings (libp2p#2227) * swarm-derive/: Make event_process = false the default (libp2p#2214) Co-authored-by: Max Inden <mail@max-inden.de> Co-authored-by: Max Inden <mail@max-inden.de> Co-authored-by: Age Manning <Age@AgeManning.com> Co-authored-by: Ruben De Smet <ruben.de.smet@rubdos.be> Co-authored-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: rishflab <rishflab@hotmail.com> Co-authored-by: Daniel Karzel <daniel@comit.network> Co-authored-by: David Craven <david@craven.ch>
@kpp, @DemiMarie can you approve the PR? I think all issues have been addressed. @mxinden what are the next steps to get this merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far so good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend running the tests thousands of times in a tight loop, using netem to simulate packet loss and/or reordering. This caught bugs I was working on my version, both in my code and in quinn-proto.
_dss: &DigitallySignedStruct, | ||
) -> Result<HandshakeSignatureValid, TLSError> { | ||
Err(TLSError::PeerIncompatibleError( | ||
"Only TLS 1.3 certificates are supported".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not reachable as rustls should have aborted the handshake sooner, but panicking here is not critical.
I hope to find some time next week to give this an in-depth review. Thanks for the ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for the delay here.
I still need more time for a full review. I got a couple questions which I am posting ahead of time. Let me know in case you would rather wait for the full review.
@@ -0,0 +1,199 @@ | |||
use libp2p_core::PeerId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of consistency, would you mind including license headers here?
/// UDP address to connect to. | ||
addr: SocketAddr, | ||
/// The remotes public key. | ||
public_key: C::PublicKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that one can thus not dial a remote without their PeerId
?
|
||
#[derive(Debug)] | ||
pub struct TransportChannel<C: Crypto> { | ||
tx: mpsc::UnboundedSender<ToEndpoint<C>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, this is the channel from libp2p not from the socket. other than erroring if the queue is full and trying to dial I don't see what we can do
I don't understand why the direction (libp2p -> socket) matters in regards to bounded or unbounded. Can you expand on that @dvc94ch?
public_key, | ||
tx, | ||
}; | ||
self.tx.unbounded_send(msg).expect("endpoint has crashed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can one not use a bounded channel here and await both the sending and the result through the oneshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can do that, but what's the point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point would be to enable backpressure throughout the system. Through backpressure an upper layer component is paused when the lower layer is overwhelmed and woken up when the lower layer is ready to receive new items.
|
||
if me.event_slot.is_none() && me.incoming_slot.is_none() { | ||
let mut metas = [RecvMeta::default(); BATCH_SIZE]; | ||
let mut iovs = MaybeUninit::<[IoSliceMut; BATCH_SIZE]>::uninit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming this is an optimization. If so, given the usage of unsafe code, what performance impact did this optimization show in benchmarks?
inner.connection.handle_event(event); | ||
} | ||
|
||
let _max_datagrams = inner.endpoint.max_datagrams(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method call have any side effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is when using quinn-proto from git which changes the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Break here so that the noise upgrade can finish. | ||
return Poll::Pending; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this? Why does this Future have to pause? Where is the waker of the task registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so the upgrade can complete
https://github.com/libp2p/rust-libp2p/pull/2159/files#diff-b1dcda9367774dd8a742457ff79bc528d030a8cc38d6791efefc78b30f3f2e03R222
} | ||
Event::Stream(StreamEvent::Readable { id }) => { | ||
tracing::trace!("stream readable {}", id); | ||
if let Some(substream) = inner.substreams.get_mut(&id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario where one would receive a StreamEvent::Readable
but not have a reference to it in inner.substreams
? If not, how about an expect
here to enforce this assumption?
Co-authored-by: Max Inden <mail@max-inden.de>
I am back next week. 🌴 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments @dvc94ch!
A couple of follow ups on the backpressure discussions. I am sorry for being stubborn here. Still I think this is an important aspect of the implementation. I am happy to expand further if need be. Thanks for bearing with me.
|
||
#[derive(Debug)] | ||
pub struct TransportChannel<C: Crypto> { | ||
tx: mpsc::UnboundedSender<ToEndpoint<C>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't prevent a user from dos'ing himself
How does an upper layer component know whether the lower layer is able to keep up. In other words, how does the upper layer know when to add more items and when to slow down? Backpressure through bounded channels enforces just that.
public_key, | ||
tx, | ||
}; | ||
self.tx.unbounded_send(msg).expect("endpoint has crashed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point would be to enable backpressure throughout the system. Through backpressure an upper layer component is paused when the lower layer is overwhelmed and woken up when the lower layer is ready to receive new items.
&self, | ||
id: ConnectionHandle, | ||
) -> (ConnectionChannel<C>, mpsc::Sender<ConnectionEvent>) { | ||
let (tx, rx) = mpsc::channel(12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the queue is needed at all. As a rule of thumb I follow the rule of Queues exist only to handle burstiness. Is creating new connections a bursty task? If not, why introduce a queue?
socket: UdpSocket, | ||
crypto_config: Arc<CryptoConfig<C>>, | ||
connections: FnvHashMap<ConnectionHandle, mpsc::Sender<ConnectionEvent>>, | ||
outgoing: VecDeque<udp_socket::Transmit>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queues are used throughout the codebase,
That is correct, yes, though I would argue that those places are either (a) not relevant (e.g. libp2p-ping
with little to no bandwidth usage) or (b) a deficiency introduced in the past.
That said, lower layers of libp2p do enforce backpressure today, e.g. the TCP transport (libp2p-tcp
) or the Yamux implementation..
I am sorry to hear that.
Understandable. I hope to be able to test this pull request sometime this or next week. |
Should I open a new PR because I can push to this PR but to David'd branch through an intermediate PR? @dvc94ch David, you didn't add license headers neither to https://github.com/ipfs-rust/libp2p-quic nor here. Would you please post license headers you prefer? I appreciate your work on this project, I can't add "Parity 2021". That would be inconvenient. |
Feel free to open a new PR. I guess as license header you can add |
|
not officially founded yet, hq will be in romania, but I guess we can start establishing it as a good open source player. |
Well, I believe the header should state the company you were working on while writing this code. Or your name if it was your pet project. |
then David Craven if you prefer |
Adds support for tls, removes the
UnboundedSender
s from the endpoint. The endpoint still hasUnboundedReceiver
s, but that's not different than having aVecDeque
. The transport dial api for example doesn't allow applying backpressure.Closes paritytech/substrate#1334