Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider giving each networking subsystem its own network-bridge and subprotocol #810

Open
rphmeier opened this issue Jun 8, 2022 · 2 comments
Labels
I4-refactor Code needs refactoring. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Jun 8, 2022

The network bridge was initially designed and implemented in late 2020, before we'd fully implemented all of the necessary networking protocols for parachains.

At that time, we weren't sure whether the gossip protocols would need to share state in some way or if they'd be completely separate. Furthermore, we combined them for the purpose of sharing a View.

As we've gone on, we've found that:

  1. The network-bridge is a bottleneck due to the amount of inter-subsystem traffic and channel size bounds and single-task processing. This has lead to attempts like split NetworkBridge into two subsystems polkadot#5616 .
  2. The gossip protocols don't need to share state
  3. It may well be that different protocols need different kinds of views (e.g. the challenges in Update statement-distribution for asynchronous backing polkadot#5055 might be easier solved without using the View and finding some other mechanism)
  4. Collators and other non-validator peers may want to follow some parachain gossip but not all. @tomaka has always given the advice that the capability of a peer shouldn't be determined by some 'role' like Validator but instead which subprotocols the peer connects on. This may be needed for p2p-XCMP, for example - but collator peers that want to follow statements and approval-checking shouldn't need to follow bitfields and disputes, for instance.

With that in mind, we should experiment with extracting network-bridge into a generic utility and embedding it into the subsystems directly, with each subsystem having its own subprotocol. If it doesn't negatively affect performance overall, we should create an upgrade path to support old peers on the standalone network bridge as well as new peers on the divided subprotocols. After that is deployed and most nodes have upgraded, we can remove the standalone network-bridge entirely.

As an added bonus, we can release our generic bridge as a crate alongside or as part of orchestra, which will make writing networked parachains even easier.

@eskimor
Copy link
Member

eskimor commented Jun 8, 2022

Yes, although we need to be careful with back pressure. I realized recently that our current architecture has some nice properties by accident, in particular our combined/non specific back pressure is actually a good thing. In particular, consider the case approval voting can't keep up. Right now this will lead to high ToFs at approval-voting and by extension on approval-distribution and more importantly will fill up the incoming channel, which leads to the network-bridge getting blocked and slowed down, which also is slowing down the backing pipeline. Therefore less stuff will get successfully backed and approval-voting can recover (as it gets less work to do) - finality can proceed.

Now consider a world where approval distribution will really only back-pressure on receiving approval votes. Depending on low level details (in the end everything is a single TCP connection), this could lead to approval-distribution getting constantly overwhelmed, while backing continues operating at full speed. Especially with escalation strategies (but even without), things might easily go south in that scenario. Resulting in an every increasing finality lag.

@rphmeier
Copy link
Contributor Author

We should maybe have the backpressure come from candidate validation instead with some priority levels - approval gets top priority and then if we can't validate any more candidates we'd have backpressure on backing that way. There's always the risk that some node can't keep up with the network, though. We could also have some logic in backing that slows it down according to some monotonically increasing f(finality_lag)

@paritytech paritytech deleted a comment Jun 13, 2022
@mrcnski mrcnski self-assigned this Jan 26, 2023
@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I4-refactor Code needs refactoring. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T8-parachains_engineering and removed I8-refactor labels Aug 25, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.142 to 1.0.143.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.142...v1.0.143)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
bkchr pushed a commit that referenced this issue Apr 10, 2024
* Simplify operational extrinsics

* Remove old extrinsics from finality verifier
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I4-refactor Code needs refactoring. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
Status: Backlog
Development

No branches or pull requests

6 participants
@eskimor @ordian @mrcnski @rphmeier @the-right-joyce and others