Add fake out requests system in peers.rs #2369

tomaka · 2022-06-10T08:15:57Z

This PR solves a tricky problem in peers.rs.

A connection can be in three phases: handshaking, established, or shutting down (and, later, completely shut down, but that doesn't count as a phase because when the connection is shut down it no longer exists).
Once a connection starts its shutdown phase, it is forbidden to start new requests on this connection, but incoming notifications can still arrive.

This is not a problem in connection.rs because we simply indicate when a connection starts shutting down, and the upper layers of the code are aware of the fact that they shouldn't start any new request but can still receiving incoming notifications.

But in peers.rs this is problematic, because peers.rs groups together multiple connections belonging to the same peer and exposes them as just one. It is for example possible to have one connection to a peer shutting down but another connection to the same peer is still established. For this reason, peers.rs doesn't report shut downs, and instead simply says "we are connected to a peer" or "we are no longer connected to a peer".

But this causes another problem: if you have only one connection left and it is shutting down, should you still tell the upper API layers that you're connected? The answer is yes, otherwise it would be weird potentially receive incoming notifications from peers you are not connected to.
But if the upper API layers think that you are connected to a peer despite having only one connection that is shutting down, what happens if these upper API layers start new requests? This is the problem that this PR solves.

This PR introduces a fake requests system in peers.rs. When a request is started and all the connections to that peer are shutting down, we create a fake request and later report as event that the request has failed.

mergify

Automatically approving tomaka's pull requests. This auto-approval will be removed once more maintainers are active.

tomaka · 2022-06-10T08:19:15Z

I've also opened #2370 which is an alternative approach, but #2370 would be longer and more difficult to do, and I might run into a wall.

github-actions · 2022-06-10T08:25:27Z

twiggy diff report

Difference in .wasm size before and after this pull request.

 Delta Bytes │ Item
─────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────
       +4367 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::next_event::hb6f87d091bf808c8
       -4106 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::next_event::h72e9646e8873adec
       +3098 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::next_event::h63f1dd0af6eee90f
       -3071 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::next_event::h51253087aaabcc13
       -1071 ┊ hashbrown::raw::RawTable<T,A>::reserve_rehash::h4d4e14f2c41649db
        +974 ┊ hashbrown::raw::RawTable<T,A>::reserve_rehash::h5c506daf47dd8686
        -848 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::new::h444a2d3945d47dc7
        +848 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::new::hec02c7b882f29333
        -761 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::unfulfilled_desired_peers::h3c89363d5dbffd79
        +761 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::unfulfilled_desired_peers::hf1b74d0d7f9f5766
        +685 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::insert::h772cdc29f66164b7
        -666 ┊ smoldot::libp2p::collection::Network<TConn,TNow>::insert::h523c30ea0f7bf119
        +618 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::new::hdc36e2f18b61d0f3
        -596 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::new::hfd6825ad9d124fbf
        -582 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::set_peer_notifications_out_desired::h82af476ca3bef8e3
        +579 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::set_peer_notifications_out_desired::hddcf377b4705d8cf
        -504 ┊ <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::hb586dd1cf9b55e10
        +504 ┊ <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::hdd97c756676df40e
        +489 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::try_clean_up_peer::h38158f98391a7909
        -489 ┊ smoldot::libp2p::peers::Peers<TConn,TNow>::try_clean_up_peer::h9991d477cd3a02d3
       +4114 ┊ ... and 298 more.
       +6920 ┊ Σ [318 Total Rows]

tomaka · 2022-06-10T08:27:48Z

bin/wasm-node/CHANGELOG.md

@@ -11,6 +11,7 @@
 - Fix another panic in case of a carefully-crafted LEB128 length. ([#2337](https://github.com/paritytech/smoldot/pull/2337))
 - Fix a panic when decoding a block header containing a large number of Aura authorities. ([#2338](https://github.com/paritytech/smoldot/pull/2338))
 - Fix multiple panics when decoding network messages in case where these messages were truncated. ([#2340](https://github.com/paritytech/smoldot/pull/2340), [#2355](https://github.com/paritytech/smoldot/pull/2355))
+- Fix panic when the Kademlia random discovery process initiates a request on a connection that has just started shutting down. ([#2369](https://github.com/paritytech/smoldot/pull/2369))


I'm mentioning Kademlia because I've only ever seen this problem happen with the Kademlia discovery. In theory it can happen with other protocols, but in practice it doesn't seem to, probably because we accidentally conveniently never start any other kind of requests during shut down.

melekes · 2022-06-14T08:08:50Z

if you have only one connection left and it is shutting down, should you still tell the upper API layers that you're connected? The answer is yes, otherwise it would be weird potentially receive incoming notifications from peers you are not connected to.

It's not immediately obvious to me why the answer is yes. If we've reported that connection is shutting down in peers.rs and upper layers are aware, then it's okay for them to receive notifications (same as in connection.rs).

But in peers.rs this is problematic, because peers.rs groups together multiple connections belonging to the same peer and exposes them as just one.

I think we've talked about this before a little bit, but this bug again is a "reminder" that the underlying design is too complex and buggy, and it would be far simpler to have a single connection to a peer instead of peers.rs pretending to have a single connection when in fact there are many.

melekes

👍

tomaka · 2022-06-14T08:47:16Z

If we've reported that connection is shutting down in peers.rs and upper layers are aware, then it's okay for them to receive notifications (same as in connection.rs).

Well, that's #2370

it would be far simpler to have a single connection to a peer

This opens us to many issues and even an attack:

Node A and B are connected to each other.
Through the random discovery process, node C tells node A that there exists a node D with a certain IP address.
Node A tries to connect to node D. It turns out that this IP address connects to node B, and this hypothetical node D is actually B. Node A finds itself with two connections to node B.

So C managed to force A to connect twice to B.

If we add some code that closes one connection when there are two, the question is which connection do you close? The older one or the newer one? Node A knows that the newer connection is "wrong", but node B doesn't know.
Closing the older one opens the door for an attack where C can force-disconnect A and B.
Closing the newer one means that if node A has a sudden disconnect and changes its IP address, then until node B realizes that the connection is unresponsive (which will take ~40 seconds) node A will be unable to reconnect.

The only good solution to that problem is to allow 2 simultaneous connections per peer.

See also paritytech/substrate#4272 (comment)

melekes · 2022-06-14T09:16:43Z

It turns out that this IP address connects to node B, and this hypothetical node D is actually B. Node A finds itself with two connections to node B.

Can't A drop the second connection once it finds out that the IP provided by C == B's IP?

melekes · 2022-06-14T09:22:30Z

See also paritytech/substrate#4272 (comment)

Thanks for the link! Didn't know there were some issues with unique connection before. It may be that multiple connections are lesser evil, I was just pointing out that when bugs like #2361 start to appear it could be the sign of improper design (i.e. bugs that force you to lay additional abstractions like fake requests so the code work).

tomaka · 2022-06-14T09:32:21Z

Can't A drop the second connection once it finds out that the IP provided by C == B's IP?

C could also advertise an IP that it controls, and then when A connects it tunnels the traffic between A and this IP to B, and thus make A believe that this IP is actually B. Comparing IPs is not a good way of ensuring that you don't connect to the same peer.

In addition to this, there's also the fact that libp2p has many different ways of connecting to a peer, such as using a relay. Sometimes you don't have an IP address available.

I was just pointing out that when bugs like #2361 start to appear it could be the sign of improper design (i.e. bugs that force you to lay additional abstractions like fake requests so the code work)

Well, the design is IMO not wrong by itself. The problem here is more that the abstraction of collection.rs and the abstraction of peers.rs are different, which adds complexity, and that I was too lazy to properly "bridge" the two.

This PR right now is about fixing a panic.
After #2352 is merged, I want to clean everything networking from the grounds up and move networking things towards their stable version.

melekes · 2022-06-14T09:43:43Z

C could also advertise an IP that it controls, and then when A connects it tunnels the traffic between A and this IP to B, and thus make A believe that this IP is actually B. Comparing IPs is not a good way of ensuring that you don't connect to the same peer.

That's what peer IDs are for, correct? And the same process (C relaying data) can't be performed with peer IDs because the B's certificate contains B's IP, right?

The problem here is more that the abstraction of collection.rs and the abstraction of peers.rs are different, which adds complexity, and that I was too lazy to properly "bridge" the two.

That makes sense, thanks 🙏

tomaka · 2022-06-14T09:59:20Z

That's what peer IDs are for, correct? And the same process (C relaying data) can't be performed with peer IDs because the B's certificate contains B's IP, right?

I don't know which certificates you're mentioning.

The original problem is that C can force A to connect to B. If A was already connected to B, then C will have forced A to open a second connection.
There's also no way for A to have the guarantee in advance that it will not connect to a peer it is already connected to.

tomaka added 2 commits June 10, 2022 10:06

Add fake out requests system in peers.rs

3898da4

CHANGELOG

6936a8f

mergify bot approved these changes Jun 10, 2022

View reviewed changes

tomaka mentioned this pull request Jun 10, 2022

Consider adding a "shutting down" phase in the API of peers.rs #2370

Closed

tomaka commented Jun 10, 2022

View reviewed changes

melekes approved these changes Jun 14, 2022

View reviewed changes

Merge branch 'main' into fix-opened-state-mismatch

2aacfe2

tomaka added the automerge Automatically merge pull request as soon as possible label Jun 14, 2022

mergify bot merged commit 4472dc1 into paritytech:main Jun 14, 2022

tomaka deleted the fix-opened-state-mismatch branch June 14, 2022 09:40

tomaka mentioned this pull request Aug 11, 2022

Fix flaws in the API of peers.rs #2610

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fake out requests system in peers.rs #2369

Add fake out requests system in peers.rs #2369

tomaka commented Jun 10, 2022 •

edited

Loading

mergify bot left a comment

tomaka commented Jun 10, 2022 •

edited

Loading

github-actions bot commented Jun 10, 2022 •

edited

Loading

tomaka Jun 10, 2022

melekes commented Jun 14, 2022

melekes left a comment

tomaka commented Jun 14, 2022 •

edited

Loading

melekes commented Jun 14, 2022

melekes commented Jun 14, 2022

tomaka commented Jun 14, 2022 •

edited

Loading

melekes commented Jun 14, 2022

tomaka commented Jun 14, 2022

Add fake out requests system in peers.rs #2369

Add fake out requests system in peers.rs #2369

Conversation

tomaka commented Jun 10, 2022 • edited Loading

mergify bot left a comment

Choose a reason for hiding this comment

tomaka commented Jun 10, 2022 • edited Loading

github-actions bot commented Jun 10, 2022 • edited Loading

twiggy diff report

tomaka Jun 10, 2022

Choose a reason for hiding this comment

melekes commented Jun 14, 2022

melekes left a comment

Choose a reason for hiding this comment

tomaka commented Jun 14, 2022 • edited Loading

melekes commented Jun 14, 2022

melekes commented Jun 14, 2022

tomaka commented Jun 14, 2022 • edited Loading

melekes commented Jun 14, 2022

tomaka commented Jun 14, 2022

tomaka commented Jun 10, 2022 •

edited

Loading

tomaka commented Jun 10, 2022 •

edited

Loading

github-actions bot commented Jun 10, 2022 •

edited

Loading

tomaka commented Jun 14, 2022 •

edited

Loading

tomaka commented Jun 14, 2022 •

edited

Loading