-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(identify): don't close stream in protocol::recv
#3344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Generally in favor even though it is only a hot-fix for an issue rooted somewhere else.
protocols/identify/src/protocol.rs
Outdated
@@ -193,7 +193,7 @@ async fn recv<T>(mut socket: T) -> Result<Info, UpgradeError> | |||
where | |||
T: AsyncRead + AsyncWrite + Unpin, | |||
{ | |||
socket.close().await?; | |||
socket.flush().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is flushing needed here? Would removing the whole line be enough?
See #3298 (comment) for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if we ignore the error on the close? Would that be a viable hot-fix too?
Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if we ignore the error on the close? Would that be a viable hot-fix too?
Preference for removing the close as it is not needed in the first place. Ignoring an error might hinder debugging in the future.
Plus, I'd like us to have a comment here linking to an issue on GitHub tracking the overall problem.
Yes please. Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elenaf9 friendly ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking into this a bit more I think closing a stream here is / will remain problematic, independently of #3343.
The reason is that we also use recv
in the inbound upgrade on Push
. If this is combined with using upgrade Version::V1Lazy
I think the following can happen in theory:
- The dialer opens a new outbound identify-push stream
- With the
V1Lazy
it directly flushes the identify payload together with the protocol-id - It drops the stream once the write is done
- The listener reads the protocol-id and echoes it back before starting the inbound upgrade
- the remote never reads the echoed-back protocol-id, thus closing the stream fails
In go-libp2p this issue is solved by simply ignoring the error in both, the multi-stream-select upgrade1 and when reading the identify2 response.
I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id. That said, even if it's implemented like that in rust-libp2p, I assume from the linked go-libp2p code that this is how they are doing it. So I think we also have to handle that case. // CC @MarcoPolo since we talked about this out-of-band - did I understand it correctly?
Am I missing something?
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the remote never reads the echoed-back protocol-id, thus closing the stream fails
I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close
. The connection task will make sure the closing of the stream is communicated to the remote.
I am not too familiar with multi-stream select in rust-libp2p so it could be that in the above case the dialer would not drop the stream until it received the echoed-back protocol-id.
It should await receiving the protocol-id-echo.
rust-libp2p/misc/multistream-select/src/negotiated.rs
Lines 308 to 310 in ab59af4
// Ensure all data has been flushed and expected negotiation messages | |
// have been received. | |
ready!(self.as_mut().poll(cx).map_err(Into::<io::Error>::into)?); |
Never the less, I am advocating for removing the close
in recv
. Any objections @elenaf9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the remote never reads the echoed-back protocol-id, thus closing the stream fails
I am not sure I understand the last step here. Note that closing a stream is a local operation only in TCP + Yamux. It only flushes the stream buffers to the local connection buffers. It does not ensure that the closing of the stream is communicated to the remote. Communicating the closing to the remote is asynchronous to calling close. The connection task will make sure the closing of the stream is communicated to the remote.
Sorry, I was probably not clear with what I meant; I was referring to @thomaseizinger suggestion to add a comment here that links to #3343. I was arguing that because in QUIC (contrary to TCP+Yamux) the close is not local only but instead fails if not all data gets acknowledged, even with #3343 resolved close
could cause issues in the described scenario.
I was mostly wondering if my understanding was correct. If it is, I would add a comment that describes that.
Never the less, I am advocating for removing the
close
inrecv
. Any objections @elenaf9?
No objections; I will remove it.
Edit: What do you think of cf21edb? (sorry for the broken commit message - forgot to escape the backticks in the message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: What do you think of cf21edb?
Friendly ping @mxinden @thomaseizinger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the comment. At the end of the day, this is a temporary situation until we fix quinn
(+ quinn-proto
), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. That's what I meant above: even with the fix, it could cause errors in case of identify Push
+ V1Lazy
if the remote doesn't read the echo-ed back protocol-id. Not an issue if both peers are rust-libp2p, but I am not sure how go-libp2p implements it.
protocol::recv
protocol::recv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomaseizinger I am assuming that you are not opposed to this patch. In case you are, please comment and I will revert. |
Interoperability tests failing due to libp2p/test-plans#121 (review). |
Description
Don't close the stream
protocol::recv
.This is a short-term fix for #3298.
The issue behind this is a general one on the QUIC transport when closing streams, as described in #3343. This PR only circumvents the issue for identify. A proper solution for our QUIC transport still needs more thought.
Links to any relevant issues
Change checklist