-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing from browser to streams opened on webtransport connection sometimes raises StopSending #3038
Comments
I found this repo with some other mention of |
Hi @zvolin, Can you share what error code is attached to the STOP_SENDING frame? If you share a repro, I can take a closer look as well. Thanks! |
|
I tried summarizing steps to reproduce this here, check readme. If something doesn't work let me know |
rust node is patched to make single bitswap request each ~2s, you shouldn't need to wait long until it reproduces |
Also this way of reproduction gives less noise and extra logs, there's less messages going around. It also reproduces with regular Celestia node and browser lumina so for go side you could just follow our readme on main, but there's gonna be much more traffic |
So I commented out everything regarding preferred protocol in basic_host to force it to always negotiate protocol. This effectively fixes the issue for streams opened by go side, and bitswap runs fine for some time. After some time go side stops responding and we never even get a new incoming stream in browser, it may be #3025. Go node still thinks everything is working, I see logs from it, writing to those streams we never even saw on rust side. Occasionally I still see the StopSending popping up on ping where browser initiates stream. Definitely not a real fix but may help navigating the bug |
diff --git a/p2p/host/basic/basic_host.go b/p2p/host/basic/basic_host.go
index 761a39b6..ddae45bc 100644
--- a/p2p/host/basic/basic_host.go
+++ b/p2p/host/basic/basic_host.go
@@ -722,21 +722,22 @@ func (h *BasicHost) NewStream(ctx context.Context, p peer.ID, pids ...protocol.I
return nil, fmt.Errorf("identify failed to complete: %w", ctx.Err())
}
- pref, err := h.preferredProtocol(p, pids)
+ _, err = h.preferredProtocol(p, pids)
if err != nil {
return nil, err
}
- if pref != "" {
- if err := s.SetProtocol(pref); err != nil {
- return nil, err
- }
- lzcon := msmux.NewMSSelect(s, pref)
- return &streamWrapper{
- Stream: s,
- rw: lzcon,
- }, nil
- }
+ // if pref != "" {
+ // if err := s.SetProtocol(pref); err != nil {
+ // return nil, err
+ // }
+ // log.Warnln("new stream 6")
+ // lzcon := msmux.NewMSSelect(s, pref)
+ // return &streamWrapper{
+ // Stream: s,
+ // rw: lzcon,
+ // }, nil
+ // }
// Negotiate the protocol in the background, obeying the context.
var selected protocol.ID |
Stop Sending is sent when we close the read side of a stream. It tells the other end we are not reading anymore. This happens on both stream.Close and stream.Reset (on the Go side). For ping at least, this is a red herring that is due to how Rust expects to reuse the stream indefinitely and Go wants to close it after some time. See: |
small debugging note: In the browser, the network UI dropdown persists on refresh, but it doesn't seem to update the state in the wasm code. You have to pick another network and then go back. |
I've found the issue around stop sending and I'll have a fix tomorrow morning. Thanks for the repro ❤️
I'm still trying to repro this part. How long do you have wait to hit this state? |
@MarcoPolo, @sukunrt, while investigating #3025, I completely forgot about The question is, can this be a reason behind #3025? It could, but @zvolin's diff above avoids any negotiation on the Go side, and it is still getting stuck in the end. @zvolin, did you disable negotiation on the rust side as well? |
I'll try to re-reproduce it shortly, my comment above was based on a single run. Just wanted to mention diff above doesn't disable negotiation. It does the opposite, always negotiate instead of choosing protocol right away. I didn't do any patches to rust libp2p except logging |
I mean lazy negotiation. The code you've commented out also negotiates but lazily, avoiding an additional roundtrip.
Unparity between those may explain things getting stuck. |
It happens after exactly 20 seconds, so after 10 bitswap requests 🤔 It always reproduces for me, regardless if I use my patch, your multistream fix or both.
I don't think there's any unparity. Moreover we just stop getting new streams from Here are logs from go node, no interesting logs on rust side (as it never saw new stream).
|
the strange thing is that errors aren't always shown. Sometimes there's error and sometimes logs in go node look completely normal, bitswap gets the stream, writes to it and closes |
I think it can be an issue on Rust side. It always happens right after:
|
@zvolin, see this comment multiformats/go-multistream#115 (comment) |
this doesn't reproduce in chrome based browsers. May indicate some bug in firefox? 🤔 |
The ping errors are red herrings I think. |
@zvolin do you think you could update the rust code to print out the QUIC Stream ID on errors? That would help debugging as you can then follow the stream data in Wireshark. |
I'm trying Firefox, and I'm not getting stuck. The height keeps increasing. I'm also not seeing the three other errors you mention. Since you can repro, could you try running the bridge with this go-libp2p patch and sharing the logs: https://gist.github.com/MarcoPolo/ba1eaeb571795bdfad86ba536c7475a4 That patch does 3 things:
If you can see which stream id hits the issue, you can then use Wireshark to inspect the stream contents with the query:
I'm running the bridge (using git tag v0.18.3-mocha) with the following params:
If you repro it, could you share logs, pcapng (wireshark capture), and the sslkeylog file? That should make it clear what's going on. |
This closed as I merged the multistream fix, but re-opening for the follow up issues. |
thanks. I just had to confirm it to myself because I couldn't stop thinking it's that, those 20s are so consistent. So I tried to reproduce stop sending on a stream from browser and then accepting a few incoming streams with just pure webtransport-go + javascript, and of course it works perfectly fine and I proved myself wrong 😛 I'll try with wireshark |
I'm afraid there's no API exposed by browser to do that.
Here are all the logs and the pcapng. I've also added 2 additional patches for rust and go libp2p, to print opening and accepting streams. Wireshark may not help, logs show that indeed no streams are opened on go side. Here is last stream that rust side saw:
and corresponding go side log:
After that, in go logs you can see no new streams being opened (but no errors) even tho it still receives bitswap requests from lumina. The things stay like that until I stop rust node, after which the call stacks start appearing on go node.
I was wondering if it can be some issue with my setup, but it seems to constantly reproduce for 2 other guys from my team and one from Prism. Can you share what firefox version and os are you using? |
From the first glance that looks very similar to what @Wondertan reported in quic-go, except it happens after 20s instead of minutes if client is firefox. But client can reach server, server can't open a new stream to client. |
I think I'm reproducing a similar issue where eventually the server stops being able to open new streams to the client, but in my case the Go server keeps trying, it just fails to open a new stream due to the timeout being hit on stream open. The stream fails to open because firefox has not given us any more stream credits (it hasn't sent a new MAX_STREAMS frame). You can see we are blocked when we send a STREAMS_BLOCKED frame. Sorry for the trouble, but could you do another run and export the qlogs and share these? Do this by setting the env var |
My env: OS: macOS 15.0.1 |
The flow in qlog is exactly like you said. I've patched rust-libp2p to count and report open inbound and outbound streams and they are quite stable around 2 outbound and 1 inbound. This leads me to thinking it could be some issue with stream closing. Either it's firefox that holds the streams after our calls to close or go / quic doesn't recognize some of them were closed (although it calls close itself 🤔). |
https://datatracker.ietf.org/doc/html/rfc9000#section-19.11-8 Yeah, so my understanding was wrong. |
I'm closing this as original issue was resolved. I'll investigate the firefox not sending max_streams further, but I'm not sure what is the good place to open the issue about it yet. I've created a minimal reproduction of pure javascript + webtransport-go and here firefox sends max_streams frame every few writes. Thanks for your support! |
We can keep this open or if you want you can create another issue with just the specific error where the server is unable to open new streams. Are you sure you're reading all the data from the stream and Closing it(in the browser)? I ask because that's the only way I can reproduce the scenario with this non libp2p repro: https://github.com/zvolin/webtransport-go-js-interop . I guess you are because it doesn't reproduce in chrome based browsers. It does look like a firefox bug though. In the qlog you've shared, I see the firefox side resetting its write half of the stream on receiving STOP_SENDING, as it should. And it has also ACKed the FIN for the read end of the stream. After this point, the only reason to not send a MAX_STREAMS frame is if there's unread stream data in the receive buffer. |
Thanks for checking.
I believe the bug is unrelated to go-libp2p. I've set up local devenv for firefox and neqo and will check from this side now.
From the side of javascript api, yes. But I'm not sure if everything is read under the hood. Recently I'm thinking the root cause can be related to those logs:
Maybe after it happens, underlying http3 impl get into some invalid state, or the glue between it and js doesn't handle that properly.
This isn't a reproduction, rather sanity check for myself. This code didn't reproduce the issue for me too. I have also a similar setup which uses webtransport api from within wasm and I sometimes get STREAMS_BLOCKED frame but it looks differently, browser never sees any incoming stream from the beginning |
may also be related to this https://bugzilla.mozilla.org/show_bug.cgi?id=1899812. However in my case amount of data transmitted is nowhere close to the one in the repro there. But it's the only other case where I've seen the |
Just for a note:
I think the difference is that I used lumina's docker devnet of celestia-nodes, and Marco ran just stock celestia-node. We had devnet configured to produce blocks each second instead of 12s, so reproduction was much quicker |
I have a setup with rust-libp2p node running in a browser and using webtransport-websys. It's connected to a go-libp2p node using webtransport and tries to exchange messages using bitswap. Both bitswap implementations open a fresh new stream per each message that needs to be sent. Writes from a browser to a new stream opened from the go node sometimes return
WebTransportStream StopSending
error. This happens during the stream upgrade, when sending multistream select header (maybe first message ever sent?), so it's much too early to respect request to stop sending. WT spec suggests it can be because the receiving side of the stream was dropped on the go side. I was debugging it only from rust side so far, so don't have any useful logs from go node yet. Here is rust side:Running in firefox results in StopSending, chromium doesn't show any error but there are no further logs after flushing header message so I suspect it hangs instead of erroring here. Also a few times I saw it when writing to streams opened from rust side during ping, but far less than on incoming.
I have a stable way to reproduce it but logs needs some local patches. If you want to run it let me know, I'll push stuff and provide steps.
Version Information
The text was updated successfully, but these errors were encountered: