-
-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692
Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692
Conversation
TODO: add a test to verify this. Other misc TODO:
|
starlette/requests.py
Outdated
self._stream_consumed = False | ||
self._is_disconnected = False | ||
self._stream_state = self._StreamState.connected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the state of "stream not consumed but disconnected" unrepresentable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: this PR can be implemented without this change, but I think it cleans things up nicely and makes the new code portion of this PR simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this means exactly:
If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.
You mean that if we create a Request
object inside a middleware/ASGI app, we can't get the body again if we use the same middleware twice, for example?
What I mean here is that if we moved this implementation into the base |
starlette/middleware/base.py
Outdated
if getattr(self, "_body", None) is not None: | ||
# body() was called | ||
self._wrapped_rcv_state = self._StreamState.consumed | ||
return { | ||
"type": "http.request", | ||
"body": self._body, | ||
"more_body": False, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is no additional buffering going on here: if the user calls request.body()
on this Request
instance within dispatch
we would already be keeping around all of the bytes for the duration of the dispatch()
call. We are just re-using that buffering for the downstream ASGI app. If the user never called request.body()
then we consume the body here, just like if the user had called request.stream()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear @adriangb : if I call request.body()
before call_next
I'll be fine, right? Any endpoint that calls request.body
will get the cached data?
class CustomMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
my_body = await request.body()
response = await call_next(request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's the idea. And if you call it after things won't hang, you'll get an empty request body (if I remember correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting back to me to quickly. The request doesn't hang anymore indeed, but you'll get a RuntimeError
if await request.body():
File "/home/basicuser/.local/lib/python3.10/site-packages/starlette/requests.py", line 236, in body
async for chunk in self.stream():
File "/home/basicuser/.local/lib/python3.10/site-packages/starlette/requests.py", line 218, in stream
raise RuntimeError("Stream consumed")
RuntimeError: Stream consumed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, you consumed the stream in the request body so it's not available in the middleware.
starlette/requests.py
Outdated
self._stream_consumed = False | ||
self._is_disconnected = False | ||
self._stream_state = self._StreamState.connected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: this PR can be implemented without this change, but I think it cleans things up nicely and makes the new code portion of this PR simpler.
Note that this does not fix #493 (comment) or any other case where the body is consumed in the endpoint and then |
I tried to figure out a way to get use cases like #493 (comment) working but failed. Here's a comparison of before and after for this change:
So this only really fixes the case that could already be fixed by moving to pure ASGI middleware, as discussed in #1519 (comment). The use cases this doesn't fix (reading the request body for the second time in an error handler or a middleware) are also not use cases you can get around using pure ASGI middleware. So the question is: do we move forward with this to fix one specific use case, or do we leave it as -is? |
Thank you all for developing and contributing to this library. Today I want to make a logging for the back-end app, and I'm stuck on it. Is there any change in the status of this PR? I've seen this PR change and I think this implementation is too confusing, we have to look at this issue differently. Sorry if I don't express myself that way, get it right, I have no doubt about your professionalism, but the Middleware codebase requires a refactor as to me |
Thanks for the feedback @ZhymabekRoman.
Is it confusing to you conceptually or the code itself?
I don't get what you mean by this. Are you referring to |
We could also detect stream double consumption here and prohibit calling stream() and body() after call_next() is called if the stream is consumed. That way there’s no error/hanging. With that I think we will have pretty much solved all of BaseHTTPMiddleware’s problems? |
Besides the ContextVar? |
Yep |
0fa5897
to
2048c44
Compare
@tomchristie are there any other changes you'd like, or anything I can do to help get this reviewed? |
1455e5c
to
6ff01c0
Compare
9f85a52
to
4ce3d73
Compare
4ce3d73
to
ec38227
Compare
@@ -26,6 +88,8 @@ async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None: | |||
await self.app(scope, receive, send) | |||
return | |||
|
|||
request = _CachedRequest(scope, receive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be here, does it? Can it be in the same place it was instantiated before, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next line references this. It, maybe, can be moved to where it was before but at the very least it will need a new variable name likeouter_request
to differentiate it from the request: Request
on line 95. It makes more sense to just move it up here, there is no harm in that.
starlette/requests.py
Outdated
@@ -223,7 +223,7 @@ async def stream(self) -> typing.AsyncGenerator[bytes, None]: | |||
body = message.get("body", b"") | |||
if body: | |||
yield body | |||
if not message.get("more_body", False): | |||
if self._stream_consumed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change is needed? The _CachedRequest
doesn't change the value of self._stream_consumed
. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_read_request_stream_in_dispatch_after_app_calls_body
fails without this logic.
Hmm... Why the more_body
doesn't matter? Like, not considering the BaseHTTPMiddleware
, why the more_body
doesn't matter to exit?
Hmmm... If we receive 2 chunks of body, how this works? It doesn't look like we have a test that covers standalone Request
with multiple chunks. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have tests for Request
as a standalone thing since it is a standalone thing in the public API and... I've been encouraging people to use it e.g. in ASGI middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, we just don't cover what I mention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it. If you already prototyped it out in your head or on paper please comment it here and save me a few min haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall how to do it from the TestClient's POV, but I thought about sending a stream with 2 chunks. Maybe you can use httpx directly if you can't do it with the TestClient.
I guess that would be enough to break this logic here, since the value of stream_consumed will not change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes, you were right, I did have a bug, good catch. I still need to modify Request
a bit, I added a couple of tests to explain why. TLDR is we were marking the stream as consumed as soon as you call stream()
but in reality you can call stream, get one message and then call steam again before it is consumed. Let me know if it's clear now.
Sorry the time it took to review this. |
c857977
to
96d5b49
Compare
@Kludex thank you for the review. No worries with the timeline, the only issue becomes that I start forgetting why I did things like I did so sorry if answers to some of your questions weren't super clear. I fixed the bug you found and reworked things to try to make the answers to your questions clearer by just looking at the code. I also added more tests, now there are 400+ lines of tests for 70 lines of implementation 😅 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long wait here.
Let's go 👍
Thought of this while working on related work to #1691. Sorry if this has already been proposed, I know we've gone around a couple times with similar stuff.
Background:
Basically if the user calls
Request.body()
from their dispatch function we cache the entire request body in memory and pass that to downstream middlewares but if they callRequest.stream()
then all we do is send an empty body so that downstream things don't hang forever.I think this behavior is pretty sensible and doesn't use any unexpected memory (e.g. caching all of the body if
Request.stream()
was called). It also doesn't break the ASGI flow: if a downstream middleware replacesreceive()
or modified messages the downstream ASGI app (including the final endpoint) will see the newreceive()
.If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.
Note that this does not fix #493 (comment) or any other case where the body is consumed in the endpoint and then
Request.body()
is called somewhere else (e.g. in BaseHTTPMiddleware aftercall_next()
or in an exception handler).