Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid race condition between pending frames and closing stream #156

Merged
merged 15 commits into from
May 24, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented May 19, 2023

Currently, we have a garbage_collect function that checks whether any of our streams have been dropped. This can cause a race condition where the channel between a Stream and the Connection still has pending frames for a stream but dropping a stream causes us to already send a FIN flag for the stream.

We fix this by maintaining a single channel for each stream. When a stream gets dropped, the Receiver becomes disconnected. We use this information to queue the correct frame (FIN vs RST) into the buffer. At this point, all previous frames have already been processed and the race condition is thus not present.

Additionally, this also allows us to implement Stream::poll_flush by forwarding to the underlying Sender. Note that at present day, this only checks whether there is space in the channel, not whether the items have been emitted by the Receiver.

We have a PR upstream that might fix this: rust-lang/futures-rs#2746

Fixes: #117.

@mxinden
Copy link
Member

mxinden commented May 22, 2023

Would the following alternative approach as well be an option?

  • Currently we have a single channel for all streams to send StreamCommand to the connection. How about a unique channel between each stream and its connection?
  • Flushing a stream is simply a matter of calling Sink::poll_flush on the mpsc::Sender.
  • Garbage collection can use the Poll::Ready(None) returned by a mpsc::Receiver from a stream as a signal instead of Arc::strong_count.

Given that the above would not require a new command variant and would not require sending Wakers over mpsc::channels, I consider it to be simpler.

@thomaseizinger what do you think? Am I missing something?

@thomaseizinger
Copy link
Contributor Author

I like that idea! Much simpler. Will implement! :)

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented May 22, 2023

I like that idea! Much simpler. Will implement! :)

There is one caveat. poll_flush on Sender only checks whether we can send more items into the channel, not whether they have been taken out on the Receiver end. Thus, we need to set the capacity to 0 for this to work.

See rust-lang/futures-rs#2504.

@mxinden
Copy link
Member

mxinden commented May 22, 2023

I like that idea! Much simpler. Will implement! :)

There is one caveat. poll_flush on Sender only checks whether we can send more items into the channel, not whether they have been taken out on the Receiver end. Thus, we need to set the capacity to 0 for this to work.

See rust-lang/futures-rs#2504.

As far as I can tell, this should be fine. We could even not call poll_flush at all. The receiver will still receive all messages and eventually receive Poll::Ready(None). See https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=def93d1497874d26f93dabf4a74b3feb.

@thomaseizinger
Copy link
Contributor Author

I like that idea! Much simpler. Will implement! :)

There is one caveat. poll_flush on Sender only checks whether we can send more items into the channel, not whether they have been taken out on the Receiver end. Thus, we need to set the capacity to 0 for this to work.
See rust-lang/futures-rs#2504.

As far as I can tell, this should be fine. We could even not call poll_flush at all. The receiver will still receive all messages and eventually receive Poll::Ready(None). See play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=def93d1497874d26f93dabf4a74b3feb.

I am not sure what you are trying to say? The point of this PR is to have poll_flush park the task if the channel is not yet empty. How can we achieve that if we don't forward to the poll_flush call in Sender?

FWIW: I attempted to fix this upstream: rust-lang/futures-rs#2746

@thomaseizinger thomaseizinger changed the title fix: implement poll_flush properly fix: implement Stream::poll_flush properly May 22, 2023
@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented May 22, 2023

  • Garbage collection can use the Poll::Ready(None) returned by a mpsc::Receiver from a stream as a signal instead of Arc::strong_count.

Ah I forgot about this. This is a bit trickier to implement because I currently use SelectAll which hides this detail from me.

@thomaseizinger thomaseizinger changed the title fix: implement Stream::poll_flush properly fix: avoid race condition between pending frames and closing stream May 22, 2023
@thomaseizinger
Copy link
Contributor Author

@mxinden This is now available for review again. I implemented your suggestion. It is the much better design :)

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, though one suggestion around using a SelectAll.

Comment on lines 505 to 506
for i in (0..self.stream_receivers.len()).rev() {
let (id, mut receiver) = self.stream_receivers.swap_remove(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polling each stream Receiver even though only one might be Poll::Ready seems wasteful. Would SelectAll not be an option? One would wrap each Receiver such that it returns the StreamId when the Receiver returns Poll::Ready(None). With that StreamId we can then call self.on_drop_stream(id).

(On consecutive polls the wrapper can return Poll::Ready(None) and thus it would be cleaned up by the SelectAll.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a SelectAll based design before but dropped it because I couldn't detect None from the outside.

Detecting that requires another data structure for translating between the stream commands and the actual frames in connection (the wrapping you mentioned).

This also needs to work for the Cleanup and Close case.

I tried hiding it in a custom collection object but we need access to the stream IntMap upon drop so that needs even more refactoring but would be a clean design.

I can invest the time if you want but I don't think it is a quick 30min refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, it was actually a super quick refactoring, now that I've seen how tokio's StreamMap works. They recommend an additional wrapper around Stream. The trick is to wrap the output of the Stream again such that you get a tuple of (key, Option<Item>) which allows you to detect closing of the stream.

yamux/src/connection.rs Outdated Show resolved Hide resolved
yamux/src/connection/stream.rs Show resolved Hide resolved
yamux/src/connection.rs Outdated Show resolved Hide resolved
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. Also great to be aware of StreamMap.

@@ -16,6 +16,7 @@ nohash-hasher = "0.2"
parking_lot = "0.12"
rand = "0.8.3"
static_assertions = "1"
pin-project = "1.1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using pin-project, we could as well implement Unpin for Stream.

impl Unpin for Stream {}

That said, I doubt there is any async Rust code-base without pin-project in its dependency tree already. Thus fine to include it here as well.

@mxinden mxinden merged commit 52c725b into master May 24, 2023
@mxinden mxinden mentioned this pull request May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GC Bug
2 participants