-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUIC] Fixed disabled receives if buffer gets drained before RECEIVE returns #72439
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsRoot cause: we're reading in a tight loop, so when RECEIVE event arrived, we'd copy some data (based on buffer capacity) and return SUCCESS with partial receive. But the reading loop thread was faster and it would drain the buffer before the event handler returned to MsQuic. Now we have The fix: check if the receive buffer has a capacity (i.e.: someone read from it in the mean time) and if so immediately re-enable receives with Interestingly, this manifested only on Windows, so I ran the offending test in a tight loop for ~10 minutes without any failures (normally it'd manifest in less than 1 minute). Fixes #72196
|
@@ -509,7 +509,7 @@ private unsafe int HandleEventReceive(ref RECEIVE data) | |||
_receiveTcs.TrySetResult(); | |||
|
|||
data.TotalBufferLength = totalCopied; | |||
return QUIC_STATUS_SUCCESS; | |||
return _receiveBuffers.HasCapacity() ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if the buffers are consumed from a read thread just a moment after you check HasCapasity? Will we still have - however small but still - a probability of a deadlock (if I understood your description correctly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read thread will see _receivedNeedsEnable
is true and call StreamReceiveSetEnabled
. It'll get queued in msquic and processed after the RECEIVE event finished, re-enabling the reads.
runtime/src/libraries/System.Net.Quic/src/System/Net/Quic/QuicStream.cs
Lines 310 to 318 in 0b90ae4
if (totalCopied > 0 && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1) | |
{ | |
unsafe | |
{ | |
ThrowHelper.ThrowIfMsQuicError(MsQuicApi.Api.ApiTable->StreamReceiveSetEnabled( | |
_handle.QuicHandle, | |
1), | |
"StreamReceivedSetEnabled failed"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The (initial) problem happens when this line on Read thread
if (totalCopied > 0 && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1) |
executes earlier than this line on MsQuic thread
Volatile.Write(ref _receivedNeedsEnable, 1); |
meaning we are able to catch that situation any time after Volatile.Write, including on return.
I have one more question:
Is it ok when we both return CONTINUE and also schedule EnableReceive afterwards? What do you think about including the same CompareExchange here as well:
return _receiveBuffers.HasCapacity() ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS; | |
return (_receiveBuffers.HasCapacity() && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1) ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, let me check if that works without breaking anything.
That would prevent queuing StreamReceiveSetEnabled
if we've already re-enabled receives via CONTINUE status --> -1 mindless P/Invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, looks good, I'll update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Failures look like #72429, i.e.: |
Root cause: we're reading in a tight loop, so when RECEIVE event arrived, we'd copy some data (based on buffer capacity) and return SUCCESS with partial receive. But the reading loop thread was faster and it would drain the buffer before the event handler returned to MsQuic. Now we have
ReadAsync
waiting on the_receiveTcs
(waiting on another RECEIVE event) and MsQuic waiting onStreamReceiveSetEnabled
since we did partial data receive.The fix: check if the receive buffer has a capacity (i.e.: someone read from it in the mean time) and if so immediately re-enable receives with
QUIC_STATUS_CONTINUE
Interestingly, this manifested only on Windows, so I ran the offending test in a tight loop for ~10 minutes without any failures (normally it'd manifest in less than 1 minute).
Fixes #72196
Fixes #72422