Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(identify): don't close stream in protocol::recv #3344

Merged
merged 7 commits into from
Feb 19, 2023

Conversation

elenaf9
Copy link
Contributor

@elenaf9 elenaf9 commented Jan 17, 2023

Description

Don't close the stream protocol::recv.

This is a short-term fix for #3298.

The issue behind this is a general one on the QUIC transport when closing streams, as described in #3343. This PR only circumvents the issue for identify. A proper solution for our QUIC transport still needs more thought.

Links to any relevant issues

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Generally in favor even though it is only a hot-fix for an issue rooted somewhere else.

@@ -193,7 +193,7 @@ async fn recv<T>(mut socket: T) -> Result<Info, UpgradeError>
where
T: AsyncRead + AsyncWrite + Unpin,
{
socket.close().await?;
socket.flush().await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is flushing needed here? Would removing the whole line be enough?

See #3298 (comment) for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we ignore the error on the close? Would that be a viable hot-fix too?

Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we ignore the error on the close? Would that be a viable hot-fix too?

Preference for removing the close as it is not needed in the first place. Ignoring an error might hinder debugging in the future.

Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.

Yes please. Good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elenaf9 friendly ping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking into this a bit more I think closing a stream here is / will remain problematic, independently of #3343.
The reason is that we also use recv in the inbound upgrade on Push. If this is combined with using upgrade Version::V1Lazy I think the following can happen in theory:

  1. The dialer opens a new outbound identify-push stream
  2. With the V1Lazy it directly flushes the identify payload together with the protocol-id
  3. It drops the stream once the write is done
  4. The listener reads the protocol-id and echoes it back before starting the inbound upgrade
  5. the remote never reads the echoed-back protocol-id, thus closing the stream fails

In go-libp2p this issue is solved by simply ignoring the error in both, the multi-stream-select upgrade1 and when reading the identify2 response.

I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id. That said, even if it's implemented like that in rust-libp2p, I assume from the linked go-libp2p code that this is how they are doing it. So I think we also have to handle that case. // CC @MarcoPolo since we talked about this out-of-band - did I understand it correctly?

Am I missing something?

Footnotes

  1. https://github.com/multiformats/go-multistream/blob/5555a5c43be95c669ce5637723ccf690a6a30e23/multistream.go#L203-L205

  2. https://github.com/libp2p/go-libp2p/blob/6b9c11680e0bb412ef7515f266531c23245015dc/p2p/protocol/identify/id.go#L435

Copy link
Member

@mxinden mxinden Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remote never reads the echoed-back protocol-id, thus closing the stream fails

I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close. The connection task will make sure the closing of the stream is communicated to the remote.

https://github.com/libp2p/rust-yamux/blob/72ccd5734f71971fc1f02ac4a5f43b8eb4dff660/yamux/src/connection/stream.rs#L365-L387

I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id.

It should await receiving the protocol-id-echo.

// Ensure all data has been flushed and expected negotiation messages
// have been received.
ready!(self.as_mut().poll(cx).map_err(Into::<io::Error>::into)?);

Never the less, I am advocating for removing the close in recv. Any objections @elenaf9?

Copy link
Contributor Author

@elenaf9 elenaf9 Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the remote never reads the echoed-back protocol-id, thus closing the stream fails

I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close. The connection task will make sure the closing of the stream is communicated to the remote.

Sorry, I was probably not clear with what I meant; I was referring to @thomaseizinger suggestion to add a comment here that links to #3343. I was arguing that because in QUIC (contrary to TCP+Yamux) the close is not local only but instead fails if not all data gets acknowledged, even with #3343 resolved close could cause issues in the described scenario.

I was mostly wondering if my understanding was correct. If it is, I would add a comment that describes that.

Never the less, I am advocating for removing the close in recv. Any objections @elenaf9?

No objections; I will remove it.

Edit: What do you think of cf21edb? (sorry for the broken commit message - forgot to escape the backticks in the message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: What do you think of cf21edb?

Friendly ping @mxinden @thomaseizinger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with the comment. At the end of the day, this is a temporary situation until we fix quinn (+ quinn-proto), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. That's what I meant above: even with the fix, it could cause errors in case of identify Push + V1Lazy if the remote doesn't read the echo-ed back protocol-id. Not an issue if both peers are rust-libp2p, but I am not sure how go-libp2p implements it.

@elenaf9 elenaf9 changed the title fix(identify): flush stream in protocol::recv fix(identify): don't close stream in protocol::recv Jan 27, 2023
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest changes, this looks good to me.

Thanks for thinking this through @elenaf9. Thanks for documenting further steps with #3343.

@mxinden
Copy link
Member

mxinden commented Feb 10, 2023

@thomaseizinger I am assuming that you are not opposed to this patch. In case you are, please comment and I will revert.

@mxinden
Copy link
Member

mxinden commented Feb 10, 2023

Interoperability tests failing due to libp2p/test-plans#121 (review).

@mergify mergify bot merged commit 79b7cef into libp2p:master Feb 19, 2023
@elenaf9 elenaf9 deleted the identify-recv branch February 23, 2023 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants