Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Temporarily stop using V1Lazy #14691

Closed
wants to merge 2 commits into from
Closed

Temporarily stop using V1Lazy #14691

wants to merge 2 commits into from

Conversation

altonen
Copy link
Contributor

@altonen altonen commented Jul 31, 2023

Parachain collators support the /block-announces/1 notification protocol but do not support the /sync/2 request-response protocol. These peers are right now treated as any other full node and they can be selected for a block request. Normally the block request would be sent and since the peer doesn't support the request-response protocol, libp2p would fail to negotiate the protocol and this failure would be propagated to SyncingEngine, causing it to remove the peer and send the block request to some other peer.

A recent change in libp2p has modified how negotiation works for streams using V1Lazy and for Polkadot it means that the negotiation failure is not detected and SyncingEngine is not notified.

If a parachain collator is selected for a block request, this will cause syncing to stall because after MAX_AHEAD_BLOCKS have been downloaded, the syncing will stop itself from downloading more blocks until all in-flight requests have been completed and since the collator cannot answer block requests and SyncingEngine is not notified of the negotiation failure, syncing halts at 0.0 bps and the node has to be restarted.

Fixes #14683


I'll check if I can find another way around this problem without disabling V1Lazy but in case I don't, we definitely want to include include this fix even if it disables an optimization.

Parachain collators support the `/block-announces/1` notification
protocol but do not support the `/sync/2` request-response protocol.
These peers are right now treated as any other full node and they can
be selected for a block request. Normally the block request would be
sent and since the peer doesn't support the request-response protocol,
libp2p would fail to negotiate the protocol and this failure would be
propagated to `SyncingEngine`, causing it to remove the peer and send
the block request to some other peer.

A recent change in libp2p has modified how negotiation works for streams
using `V1Lazy` and for Polkadot it means that the negotiation failure
is not detected and `SyncingEngine` is not notified.

If a parachain collator is selected for a block request, this will
cause syncing to stall because after `MAX_AHEAD_BLOCKS` have been
downloaded, the syncing will stop itself from downloading more blocks
until all in-flight requests have been completed and since the collator
cannot answer block requests and `SyncingEngine` is not notified of the
negotiation failure, syncing halts at 0.0 bps and the node has to be
restarted.
@altonen altonen added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Jul 31, 2023
@altonen altonen requested a review from a team July 31, 2023 15:11
@dmitry-markin dmitry-markin requested a review from a team July 31, 2023 17:47
@altonen
Copy link
Contributor Author

altonen commented Aug 1, 2023

Comparing our request-response code with the multistream-select implementation, this looks to be the offending line:

I don't know why write_request() would attempt to close the substream since the same substream is used to read the response but that line coupled with the latest V1Lazy changes causes the request to enter into a state where it doesn't emit an error and the request gets stuck in syncing.

All that said, I tried running a node with that line commented out and while it worked better than without anything, it still got stuck after a few hours whereas disabling V1Lazy has been running issues much longer.

@altonen
Copy link
Contributor Author

altonen commented Aug 4, 2023

bot rebase

@paritytech-processbot
Copy link

Rebased

@mxinden
Copy link
Contributor

mxinden commented Aug 12, 2023

A recent change in libp2p has modified how negotiation works for streams using V1Lazy

I am assuming that you are referring to libp2p/rust-libp2p#4019, correct?

and for Polkadot it means that the negotiation failure is not detected and SyncingEngine is not notified.

Negotiation failure will not be detected on poll_close. Though given that it is a request-response protocol, it will detect the negotiation failure when expecting the response. You should receive a OutboundFailure. Is that not the case @altonen?

@altonen
Copy link
Contributor Author

altonen commented Aug 12, 2023

@mxinden

You should receive a OutboundFailure. Is that not the case @altonen?

That seems to be the case, yes. This message is printed:
https://github.com/libp2p/rust-libp2p/blob/01e7353d047e82a6b036609ad65ec3b5b9ce0163/misc/multistream-select/src/negotiated.rs#L320

but otherwise we don't get any signal back to the syncing subsystem and syncing stalls.

FWIW, we shouldn't add parachain collators as syncing peers in the first place since they don't support the request-response protocol but this change seems to cause trouble for us whereas before this release it worked fine. With the latest libp2p update we're also experienced issues with Kademlia so we've decided to revert the upgrade temporarily and look into the issues in closer detail when time permits.

@altonen
Copy link
Contributor Author

altonen commented Aug 16, 2023

Superseded by #14722

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Block request timeout not working
5 participants