Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(identify): don't close stream in protocol::recv #3344

Merged
merged 7 commits into from
Feb 19, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions protocols/identify/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@

- Update to `libp2p-swarm` `v0.42.0`.

- Flush stream in `protocol::recv` instead of closing it. See [PR 3344].
elenaf9 marked this conversation as resolved.
Show resolved Hide resolved

[PR 3208]: https://github.com/libp2p/rust-libp2p/pull/3208
[PR 3344]: https://github.com/libp2p/rust-libp2p/pull/3344

# 0.41.1

Expand Down
2 changes: 1 addition & 1 deletion protocols/identify/src/protocol.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ async fn recv<T>(mut socket: T) -> Result<Info, UpgradeError>
where
T: AsyncRead + AsyncWrite + Unpin,
{
socket.close().await?;
socket.flush().await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is flushing needed here? Would removing the whole line be enough?

See #3298 (comment) for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we ignore the error on the close? Would that be a viable hot-fix too?

Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we ignore the error on the close? Would that be a viable hot-fix too?

Preference for removing the close as it is not needed in the first place. Ignoring an error might hinder debugging in the future.

Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.

Yes please. Good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elenaf9 friendly ping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking into this a bit more I think closing a stream here is / will remain problematic, independently of #3343.
The reason is that we also use recv in the inbound upgrade on Push. If this is combined with using upgrade Version::V1Lazy I think the following can happen in theory:

  1. The dialer opens a new outbound identify-push stream
  2. With the V1Lazy it directly flushes the identify payload together with the protocol-id
  3. It drops the stream once the write is done
  4. The listener reads the protocol-id and echoes it back before starting the inbound upgrade
  5. the remote never reads the echoed-back protocol-id, thus closing the stream fails

In go-libp2p this issue is solved by simply ignoring the error in both, the multi-stream-select upgrade1 and when reading the identify2 response.

I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id. That said, even if it's implemented like that in rust-libp2p, I assume from the linked go-libp2p code that this is how they are doing it. So I think we also have to handle that case. // CC @MarcoPolo since we talked about this out-of-band - did I understand it correctly?

Am I missing something?

Footnotes

  1. https://github.com/multiformats/go-multistream/blob/5555a5c43be95c669ce5637723ccf690a6a30e23/multistream.go#L203-L205

  2. https://github.com/libp2p/go-libp2p/blob/6b9c11680e0bb412ef7515f266531c23245015dc/p2p/protocol/identify/id.go#L435

Copy link
Member

@mxinden mxinden Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remote never reads the echoed-back protocol-id, thus closing the stream fails

I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close. The connection task will make sure the closing of the stream is communicated to the remote.

https://github.com/libp2p/rust-yamux/blob/72ccd5734f71971fc1f02ac4a5f43b8eb4dff660/yamux/src/connection/stream.rs#L365-L387

I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id.

It should await receiving the protocol-id-echo.

// Ensure all data has been flushed and expected negotiation messages
// have been received.
ready!(self.as_mut().poll(cx).map_err(Into::<io::Error>::into)?);

Never the less, I am advocating for removing the close in recv. Any objections @elenaf9?

Copy link
Contributor Author

@elenaf9 elenaf9 Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remote never reads the echoed-back protocol-id, thus closing the stream fails

I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close. The connection task will make sure the closing of the stream is communicated to the remote.

Sorry, I was probably not clear with what I meant; I was referring to @thomaseizinger suggestion to add a comment here that links to #3343. I was arguing that because in QUIC (contrary to TCP+Yamux) the close is not local only but instead fails if not all data gets acknowledged, even with #3343 resolved close could cause issues in the described scenario.

I was mostly wondering if my understanding was correct. If it is, I would add a comment that describes that.

Never the less, I am advocating for removing the close in recv. Any objections @elenaf9?

No objections; I will remove it.

Edit: What do you think of cf21edb? (sorry for the broken commit message - forgot to escape the backticks in the message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: What do you think of cf21edb?

Friendly ping @mxinden @thomaseizinger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with the comment. At the end of the day, this is a temporary situation until we fix quinn (+ quinn-proto), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. That's what I meant above: even with the fix, it could cause errors in case of identify Push + V1Lazy if the remote doesn't read the echo-ed back protocol-id. Not an issue if both peers are rust-libp2p, but I am not sure how go-libp2p implements it.


let info = FramedRead::new(
socket,
Expand Down