-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Chrome datachannel stuck at closing #238
Conversation
Codecov ReportBase: 80.86% // Head: 81.04% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #238 +/- ##
==========================================
+ Coverage 80.86% 81.04% +0.17%
==========================================
Files 48 48
Lines 3962 3994 +32
==========================================
+ Hits 3204 3237 +33
+ Misses 619 618 -1
Partials 139 139
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerry-tao Thanks for this PR. You are right about the missing implementation of the outgoing reset on receipt of the same from remote. (thanks for the RFC reference!) I left one comment and hoping you could address it. Thank you!
@enobufs I'm not sure about how to fix the test, after the change there will be 1 more pending reconfig chunk(outgoingReset).
But this is datachannel spec, not SCTP, i'm not clear about how a SCTP stream should close. |
@jerry-tao Thanks for following up on the error. It's not immediately obvious to me how to resolve it. Let me see what I can find this weekend. |
@enobufs During testing the fix, i found a problem, if client create and close datachannel quickly, the server will won't handle it and throw the error:
It occured both fix and unfix version, but the unfix version only the closed datachannel have the error. Simple Client Code
At the unfix version the first 13579 stream will have:
But the At the fix version, the dc last won't get the response but just :
I think it's because the datachannel closed it's just reset the stream, not remove it. |
@jerry-tao const test1 = (ep) => {
let sess = new Session(ep);
sess.on('dcopen', (dc) => {
console.log(`new datachannel ${dc.label}`)
dc.send(`Hello from ${dc.label}`)
setTimeout(() => {
console.log(`closing datachannel ${dc.label}`)
dc.close()
}, 1000)
})
sess.open(6); // open 3 data channels
}; My app initiates dc.close() from the browser (just like your example) and never initiated by the peer using pion. Looking at pion/datachannel implementation, I noticed what you have pointed out in #187 appears to have been implemented there. When Chrome side calls dc.close(), this will send Reconfig to the peer, which would cause stream.ReadSCTP to return io.EOF. The above code, then call stream.Close() which is to conform what the RFC says in the section 6.7 you pointed out. Now, the question is why you were not seeing the 'close' event on Chrome. Could you please review your test code? |
Here's my test code in case helps. |
I may fix it in a wrong way. |
Here is my test code to reproduce the cloinsg state: ice-tcp.zip, |
I run your test code, in the chrome://webrtc-internals/ it shows all datachannel stuck at closing too. |
You are right. I read my console logs wrong. The 'close' event never fired before the fix. Digging deeper now. |
I found the problem with
With or without this PR, the problem 1 is same. |
Thanks @jerry-tao. I have not seen that error log yet. I will try to reproduce it on my end as well. |
You could put a
Send then close, it will produce the error. |
@jerry-tao Ok I will try that. Thank you! I think I know what is going on. I believe we are removing a stream from the map |
Hi @jerry-tao, Chrome does not seem to work as we expect, either. I wrote a script to do the same between the browsers (please find the attachment). Situation is even worse than pion because server side readyState goes to
FireFox is even worse... Server side datachannel is left open (never transitions to 'closing' or 'closed' as far as I can see. Could you please review my browser script, and let me know what you think? So, I believe the reason why data appears to be dropped at server (pion) side is not due to |
For the sending on closing state, I think it should happen at someone trying to call
|
@jerry-tao
As you said, It's a good idea to return an error from WriteSCTP when the stream state is not open. (the PR to your fork
This is not entirely true. the SCTP's association is still there to handle those chunks. It's just that RFC 8831 does not make it clear if it intends to deliver data received during the closing state to the app level. (Chrome does not seem to receive them but that does not mean Chrome implements it correctly.) DCEP (RFC 8831) makes use of RFC 6525 (Reconfig) as a way to close stream. OutgoingResetRequest, as its name implies, is similar to shutdown(SHUT_WR), the TCP's "graceful shutdown", where you can still receive data from the socket while the outgoing is closed, until read() returns EOF... |
Yes I agree with you about there are two probelmes. But I don't think we could solve it sepreatly. |
Ok. Let's see what we can find by next week.
Interesting. How do you cause this "reuse" of stream ID by Chrome? (how can I repro it?) |
If we trying to send data after the channel was closed by chrome the chrome will throw the error:
You could using my test code and log the dc object at the open event, you will find they have the same id:
I have successfully fixed it by workaround, you could check it https://github.com/jerry-tao/sctp/tree/wip, it works as expected but may not fullfil the RFC. |
The wip is just my dummy workaround, it worked fine and as expected in my test. Maybe we could have another way to do it. |
Hi @jerry-tao I have invited you to the pion (github) organization as a member. Once you accept it, you should be able to directly create a branch for the future pull-request. Github action for auto test should kick in without an approval! ;) FWIW, Here's my analysis of logs on a spreadsheet from pion (with my modified pion-server) which sends 10 messages on a datachannel, then call dc.close() immediately (with 5 msec delay). I see a couple of 'stream 1 not found' error but all the 10 message were echoed back to Chrome as you can see with SACK (selective ack). On the Chrome side script though, it only received the first 3 messages. (7 were dropped, or ignored by Chrome for some reason). Also, I did not see any new channels created by Chrome. I used I will try your test code next time. |
I try test it reverse, I think the Chrome stop sending buffer data if it received reconfig.
When I received a message i closed the channel:
The Chrome will stop sending and turn it to closed state. |
And I found this:
|
@jerry-tao First of all, I think we should let datachannel (DCEP) layer close the outgoing stream on EOF. RFC 8831 is not about SCTP layer. This has been fixed, along with a unit test in this PR: jerry-tao#2 I don't think removing data chunks in the pending queue is necessary. RFC 6525 indicates all data written before OutgoingResetRequest should be delivered. The method a.sendResetRequest() pushes a empty DATA chunk as an indicator of EOS (End of Stream) to insure that. (the method does not immediately create an OutgoingResetRequest) Also...
I still don't know exactly what the problem here is. I followed your code in which creating a new data channel right after closing the previous datachannel, repeating 5 times. I can see stream ID is reused. I don't see any problem with it. Each iteration (as soon as new data channel becomes available), sending one message, which is successfully echoed (by pion) back to Chrome with no problem. I can see all data channels in 'closed' state at the end successfully. I tried both 03f5520 and pion:issue-187-rev2 . I think we should step back a little bit and try to come up with a code with which we both can see the same problem first. If you could share a complete code that demonstrate the problem you are seeing, that would be a great start. I highly recommend we both base on pion:issue-187-rev2. |
In our use case, the problem is we send huge amount of data through data channel, they will be put in pendingQueue. |
@enobufs I have reviewed your PR at jerry-tao#2, and will working on a new PR about clear the buffer. |
@jerry-tao Good finding! That's exactly how Chrome appears to behave from what I saw. FYI. I asked questions to a co-author of RFC 8831 (and 6525) about section 6.7. He agrees that from RFC 8831 (based on RFC 6525), it is natural for us to assume that what section 6.7 offers is a "graceful shutdown" like we do with shutdown(socket, SHUT_WR) while the socket is still able to read data from the remote. But, looking at W3C's API, he said it felt like the spec followed a different path, and RTCDataChannel.close() is, like close(socket) in TCP, which closes both outbound and inbound... W3C API doc, though, has this text in 6.2.4 Closing procedure:
But then, the quote in your finding:
This may be true, if W3C's RTCDataChannel.close() is meant to close both directions. That is, once you have received 'OutgoingResetRequest' from peer, you know that the peer no longer want to receive anything, we could discard the pending data (not inflight yet) when it sends 'OutgoingResetRequest'. (NOTE: we would discard the pending only when we have received OutgoingResetRequest already.) If you'd like to follow through your work on this, I am ok with this direction, and I'd be happy to review your code. (I have added a couple of comments to pion/sctp: State Transition) Please let me know your thoughts. |
Now we know they are unrelated issues I think we could finish this PR first (still working on the lint), I open a new issue #239 to discuss the pendingBuffer. |
@enobufs BTW, do you prefer I squash this PR to a single commit or left the commits history? |
Sure > squash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jerry-tao for your great work. Thanks for adding state checking in WriteSCTP() method. It looks good too.
Description
When close datachannel from chrome, sctp should response with an outgoingResetRequest, otherwise the chrome won't fully close the datachannel.
Reference issue
Fixes #187