Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpToolsProtocol: receive() hangs due to signalling mixup between request cycles #748

Closed
itayperl opened this issue Aug 9, 2020 · 8 comments · Fixed by #848
Closed
Labels

Comments

@itayperl
Copy link

itayperl commented Aug 9, 2020

I have a FastAPI server running with uvicorn. Starting with starlette 0.13.7 I'm hitting a race condition on a specific resource limited setup (RPi), however if my analysis is correct the root cause is a bug in uvicorn's protocol.http.httptools_impl.

The race condition is causing starlette to keep receive()ing on a RequestResponseCycle for a bit after the response is fully sent, while the client sends a new request on the same connection:

  1. Cycle 1 is blocking on message_event in RequestResponseCycle.receive() (called before response_complete is set)
  2. HttpToolsProtocol reads the body of the new request into Cycle 2 and sets message_event
  3. Cycle 1 clears message_event and returns
  4. Receiving on Cycle 2 blocks because message_event is already cleared

If I'm not mistaken, the fix is pretty simple:

  1. Create a separate event for each cycle.
  2. In RequestResponseCycle.send, when setting response_complete also set message_event to kill any lingering receive()s

Minimal test case

This raw ASGI server imitates the behavior in Starlette that triggers the bug.

import asyncio

async def wait_for_disconnect(receive):
    while True:
        p = await receive()
        if p['type'] == 'http.disconnect':
            print('Disconnected!')
            break
           
async def app(scope, receive, send):
    await asyncio.sleep(0.2)
    m = await receive()

    if m['type'] == 'lifespan.startup':
        await send({'type': 'lifespan.startup.complete'})
    elif m['type'] == 'http.request':
        if scope['path'] == '/foo':
            asyncio.create_task(wait_for_disconnect(receive))
            await asyncio.sleep(0.2)

        await send({'type': 'http.response.start', 'status': 404})
        await send({'type': 'http.response.body', 'body': b'Not found!\n'})

Test with curl localhost:8000/foo localhost:8000/bar. You should see that it hangs at the request to /bar.

The /foo request calls receive() in the background, and this call is still running in the background when the request to /bar arrives. The http.request message for the /bar request incorrectly wakes up the background (/foo) receive, causing the receive() for /bar to hang forever, as there are no more events for this request.

@euri10
Copy link
Member

euri10 commented Oct 12, 2020

hi @itayperl
afaik FastAPI currently pins Starlette on 0.13.6, 1st thing to check would be if that what you described still happens respecting the pin.

@itayperl
Copy link
Author

@euri10, I'm sorry if my description was a bit unclear, but this is a bug in uvicorn that should be fixed regardless of Starlette. It just didn't manifest (in my usecase at least) with older versions of Starlette. Any code working with multiple httptools_impl.RequestResponseCycle objects in parallel can potentially hit this bug.

@euri10
Copy link
Member

euri10 commented Oct 12, 2020

could you either confirm you experience this on 0.13.8 too (because can't remember exactly but I think 0.13.8 reverted something in middleware that makes me think it's related, can be totally off of course)

@itayperl
Copy link
Author

I was able to reproduce the bug on both 0.13.6 and 0.13.8 by adding the 100ms sleep in starlette as described above.

I can't say if it will actually reproduce "in the field" in either version, without the sleep, but I did see it happen on 0.13.7 in a real production setting after a day or so of operation.

@euri10
Copy link
Member

euri10 commented Oct 12, 2020 via email

@itayperl
Copy link
Author

itayperl commented Oct 12, 2020

Edit: updated test case moved to issue description

@itayperl
Copy link
Author

@euri10, were you able to reproduce the bug using the new test case?

@euri10
Copy link
Member

euri10 commented Nov 12, 2020

@euri10, were you able to reproduce the bug using the new test case?

I can yes, this helped a lot thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants