-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition that leads Quart to hang with uvicorn #848
Conversation
)" This reverts commit d5dcf80
…ncode#832)"" This reverts commit 64049e5
with the change, running the various apps contained in pure asgi app
quart app
starlette app
|
would appreciate some feedback on it before jumping on making the change on h11 too, but this looks like setting up a new asyncio Event per RequestResponseCycle instead of having it at the HttpToolsProtocol level does the job. |
This comment has been minimized.
This comment has been minimized.
yep you're absolutely correct, I think having a message_event per cycle solves it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
846_quart_race.py and payload.lua will of course be removed, this is to get something reproducible if someone felt digging !
I just looked at how you did, it's slightly different: I erased entirely the I have no preference and not sure what is the best way, this said, to build some confidence around the fix, some tests would help, the snippet in #748 looks like a good basis, but I have been unable to get a hanging behaviour in python alone, seems like doing |
this worked for me:
|
Hey @euri10 I agree with @itayperl to add a |
yes it seems to be the case, it's always hard to figure out what the "order" should be in those interleaved spaghetti of coroutines... |
@tomchristie @florimondmanca Could you please help review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only found a small knack. Otherwise given the feedback that people have been running a similar fix in production for a while, LGTM. :-) Nice one!
uvicorn/protocols/http/h11_impl.py
Outdated
@@ -146,7 +145,7 @@ def connection_lost(self, exc): | |||
# Premature client disconnect | |||
pass | |||
|
|||
self.message_event.set() | |||
self.cycle.message_event.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need…?
self.cycle.message_event.set() | |
if self.cycle is not None: | |
self.cycle.message_event.set() |
(I see we check this higher up in this method, and we do in the HttpTools implementation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember effectively I added the if self.cycle is not None:
in the http_tools implementation because one test failed without it,
the fact that tests pass without it in the h11 implementation is probably due to the fact (dont quote me on that) that we may only have that failing test on a default config which by default will pick the httptools implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added your suggestion on both httptools and h11 @florimondmanca
thanks for the review @florimondmanca ! |
Thanks all! |
EDIT: ready for review, long story short there is a race condition that is fixed (hopefully), details below on the reasons it's happening
fixes #847
fixes #748
Just posting this as a draft for potential ideas / discussion to solve / understand #847, I've got headaches trying to think about it clearly !
I've added some trace logs on a branch,
then ran with:
so that wrk sends 1 concurrent requests for 1s.
This is enough to get only 1 req/s on Quart and 140ish on Starlette.
the diff log is represented below, at disposal on the branch too in quart.log and starlette.log:
what we see is that it's the same until line 19:
TRACE: 127.0.0.1:59398 - ASGI [2] Receive {'type': 'http.request', 'body': '<199 bytes>', 'more_body': False}
at that point there is a difference, uvicorn RequestResponseCycle receive coroutine is entered again on quart while starlette is getting the send messages.
now the send messages are slightly different too:
quart
starlette
see there is that
more_body
True then False on quart while in starlette there is not.I suspect there is a case where in uvicorn we don't put the
response_complete
flag where we should, but really here fail at seeing where...