Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swarm): Eliminating repetitive arc construction on each poll #4991

Closed
wants to merge 0 commits into from

Conversation

jakubDoka
Copy link
Contributor

@jakubDoka jakubDoka commented Dec 9, 2023

Description

Connection now stores THandler::InboundProtocol::Info for local protocol list. It no longer uses hash set since we deal with small amount of protocols. The protocols are converted only if they need to be sent in a an event. The ProtocolChange now only stores Smallvec<[StreamProtocol; 1]> (maybe just a Vec can be good too).

I implemented minimal DelegatedWaker that will skip the polling if it was not woken up, when benchmarking the effect of this change on simple rpc protocol, I observed around 3-5% improvement, though my benchmark rather extensively tested the muxer sine the protocol reused connections.

Please include any relevant issues in here, for example:
Fix #4990

Notes & open questions

I wonder what could be a good benchmark for this, maybe making dropping and making more connections?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@jakubDoka jakubDoka changed the title Fix for the repetitive arc construction on each swarm::Connection::poll + clippy warnings fix(swarm): Eliminating repetitive arc construction on each swarm::Connection::poll + clippy warnings Dec 22, 2023
@jakubDoka jakubDoka changed the title fix(swarm): Eliminating repetitive arc construction on each swarm::Connection::poll + clippy warnings fix(swarm): Eliminating repetitive arc construction on each poll Dec 22, 2023
@jakubDoka jakubDoka force-pushed the master branch 2 times, most recently from 59dfdf8 to 9262579 Compare December 22, 2023 00:13
@jakubDoka
Copy link
Contributor Author

jakubDoka commented Dec 22, 2023

@mxinden or @thomaseizinger, I managed to make kad and identify work with more granular waking for connection handlers. I need to ask at this point if I am heading in a good direction, I realized, eventhough, in theory, less amount of polling should be performed, this is very saddle breaking change, it seems that almost all of the protocols break (tests mostly timeout), since they don't call wake when needed. I need to know before I invest more time into this.

Comment on lines 431 to 436
impl<'a> Iterator for ProtocolsAdded<'a> {
type Item = &'a StreamProtocol;
fn next(&mut self) -> Option<Self::Item> {
self.protocols.next()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was by design that this is the only public API exposed by ProtocolsAdded and ProtocolsRemvoed. Can't we keep this but back it by a SmallVec instead?

@thomaseizinger
Copy link
Contributor

@mxinden or @thomaseizinger, I managed to make kad and identify work with more granular waking for connection handlers. I need to ask at this point if I am heading in a good direction, I realized, eventhough, in theory, less amount of polling should be performed, this is very saddle breaking change, it seems that almost all of the protocols break (tests mostly timeout), since they don't call wake when needed. I need to know before I invest more time into this.

Thank you! DelegatedWaker looks very interesting and I think that could be applied to several parts of the codebase. Most notable, we currently also always re-poll every NetworkBehaviour and ConnectionHandler if any of them get woken up.

But, as you've found out yourself, pretty much all protocols currently rely on this and it does simplify writing them a lot. Getting state machines right is hard enough as it is. It is kind of nice not having to think about wakers too. As such, I think a move forward with something like DelegatedWaker needs to be proven with benchmarks.

In its current state, this PR performs several kinds of optimisations. Vecs get replaced with references, the cloning of protocols is removed and we implement the DelegatedWaker. Having all of these things put together, we can't say which of these actually bring the performance benefit of 3-5%. I'd also be curious to know what improves by 3-5%? Throughput? Latency? Max speed?

Before we take on an additional burden such as DelegatedWaker with all the protocol refactorings that come with it, we first need to prove that our naive polling and waking is in fact a bottleneck. rust-libp2p has not been optimised a lot. Probably the best optimisation so far has been implemented by @mxinden just recently: #4970. I'd expect that there is a lot more low-hanging fruit like this. libp2p-noise and libp2p-tls can likely be optimised in their use of buffers. Either one of them is in use by every connection that rust-libp2p establishes which suggests a potentially very big impact.

All of this is to say that I think we are not spending our time very wisely if we optimise wakers. I am happy to be proven wrong by benchmarks though. Note that I think the (excessive) cloning of protocols is a problem, mostly because it is O(N). In my opinion, the path forward here is to simplify / remove the UpgradeInfo trait. See #2863 for details. This would not only simplify the interface which makes writing protocols easier but it would also remove the cloning because we could simply access a slice or iterator of StreamProtocol. It is a much harder task though.

My suggestion would be:

  • Attempt to ship the optimisation for the removal of cloning as a backwards-compatible change (I made a comment in that direction). Single this work out into a dedicated PR.
  • Drop the work around DelegatedWaker. It seems like a very useful thing though so perhaps you are interested in polishing it and publishing it to crates.io.

Curious to hear what you think! Thank yo for all this work :)

@jakubDoka
Copy link
Contributor Author

@thomaseizinger, I need to admit, I started tackling the waker issue because its incredibly hard to do, which I find really fun and satisfying when i get it to work. But your feedback makes a lot of sense, I can drop this since from benchmarks I really seems its not worth the tradeoff yet, though I'd like to contribute somehow anyway.

For your first bullet point, I can definitly do that. I noticed that nobody is working on #4790. We could make a transition to simplified connection handler with impl<T: SimpleConnectionHandler> ConnectionHandler for T deprecating the current ConnectionHandler, maybe return &[StreamProtocol] for inbound and outbound stream requests, that seems a lot simpler. This should cover the most common usecases by just returning reference to const declaration which should be very optimal.

@jakubDoka
Copy link
Contributor Author

I managed to not break the api #5026

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Dec 23, 2023

@thomaseizinger, I need to admit, I started tackling the waker issue because its incredibly hard to do, which I find really fun and satisfying when i get it to work

We all like to solve hard problems, ey? :)

I noticed that nobody is working on #4790. We could make a transition to simplified connection handler with impl<T: SimpleConnectionHandler> ConnectionHandler for T deprecating the current ConnectionHandler, maybe return &[StreamProtocol] for inbound and outbound stream requests, that seems a lot simpler. This should cover the most common usecases by just returning reference to const declaration which should be very optimal.

That would be the end goal! If you are interested in working on that, I can fill you in on what - in my view - the biggest challenges are:

  1. ConnectionHandlers get composed into a tree by repeatedly nesting them using ConnectionHandlerSelect. ConnectionHandlerSelect in turn uses SelectUpgrade to compose two UpgradeInfos together where only one ends up getting selected based which protocol the remote wants for a particular stream. In the current design, UpgradeInfo::Info is an associated type which allows SelectUpgrade to return an Either of its two inner types. This in turn makes it type-safe and fast (O(1)) to dispatch back to the correct inner upgrade. See
    fn upgrade_inbound(self, sock: C, info: Self::Info) -> Self::Future {
    match info {
    Either::Left(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)),
    Either::Right(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)),
    }
    }

    If we remove InboundUpgrade from ConnectionHandler and replace it with Iterator<Item = StreamProtocol>, we lose this capability. ConnectionHandlerSelect would have to iterate through all protocols of its two children and look for a match which ends up being an O(n) algorithm. In addition, we now have an error case in our design where we have to deal with the case of negotiating a protocol that we didn't offer in the first place. In the context of the entire system, that is not possible but the type-system forces us to handle this case anyway which is somewhat annoying. I don't think the O(n) algorithm here is a problem because it only runs in the context of a stream protocol negotiation which involves network traffic. Spending a few nanoseconds on a loop and some comparisons doesn't make any difference when you've got an RTT of 30ms (i.e. 30_000ns) to establish the stream in the first place. I am more worried about the complexity in the design in handling the error case and somebody eventually introducing a bug there. But even with all this being said, I think the design is still better because it makes things significantly easier for users.
    Now the question is: Is there a better design than that? :)
  2. Doing all of the above with as little disruption as possible. We need to work out a really good migration plan. Currently, there are several big users like polkadot for which any breaking change to rust-libp2p causes significant churn, see Upgrade libp2p to 0.52.4 paritytech/polkadot-sdk#1631 for example. As far as I know, polkadot does in fact make use of the InboundUpgrade and OutboundUpgrade mechanism to read and write to the stream. In order to make the upgrade as smooth as possible, we definitely need to issue a patch-release that deprecates the correct things and allows them to migrate piece-by-piece. So far, my idea has been to:
  • Deprecate the InboundUpgrade and OutboundUpgrade traits, redirecting users to use ReadyUpgrade instead. We probably want to introduce a new ReadyUpgrade in libp2p-swarm that hardcodes the use of StreamProtocol.
  • Once users are migrated to ReadyUpgrade, they are already forced to poll streams manually in the ConnectionHandler. This is the biggest part of the migration as the overall goal of swarm: Remove {In,Out}boundUpgrade #4790 is to just hand streams to the ConnectionHandler.
  • Lastly, and this would be the final breaking change, we could remove the associated types of ConnectionHandler directly and have functions such as listen_protocol instead return a list / iterator of protocols1.

Footnotes

  1. I don't think &[StreamProtocol] will work because ConnectionHandlerSelect needs to concatenate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive calls to poll and cloning of Wakers.
2 participants