-
-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage streaming large responses #1012
Comments
I kind of remember it has been discussed already here, not sure it's
relevant to your case though
#802
…On Mon, Jul 27, 2020, 7:23 PM via ***@***.***> wrote:
We've been running into memory issues when providing very large async
generators to a streaming response. We have these generators producing
large (larger than memory set) responses in a way that allows us to only
keep small chunks in memory at a time. However, it looks like the
BaseHTTPMiddleware implementation uses an asyncio queue to store the
individual chunks:
https://github.com/encode/starlette/blob/master/starlette/middleware/base.py#L30
This prevents any network backpressure handling -- if the client that is
receiving the streaming response is on a slow connection, the queue will
happily grow without bound and consume all memory, triggering kernel
out-of-memory, when the ideal handling here would be for send to block
(yield) when this happens. I believe this would naturally happen if there
were no queue here at all, so I am wondering why it needs to be here?
Would a PR to remove the queueing be accepted?
If not, what is the appropriate way to override this to not use a queue?
We can write our own, but the use of BaseHTTPMiddleware is hardcoded:
https://github.com/encode/starlette/blob/519f5750b5e797bb3d4805fd29657674304ce397/starlette/applications.py#L197,
leaving only some fairly hacky approaches to preventing this queueing.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1012>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAINSPT7UOTUWBXAWKM352TR5WZX5ANCNFSM4PJAVLSA>
.
|
I did see that ticket, but I do think its important to distinguish between the high level backpressure logic that I think that ticket is mostly about (specifically managing concurrent request limits, etc), and network backpressure. I do not believe that ticket is correct when it suggests that uvicorn/gnuicorn manages the low-level network socket backpressure, as the use of the asyncio.Queue there is completely in starlette, and will happily absorb the entire response payload until it dies. |
I think queueing is fine, but the queue is unbounded which results in losing the backpressure properties provided by the server. (As you said, it’s allowed to grow forever even if the other side isn’t consuming chunks fast enough.) In my mind setting a Edit: actually as discussed by @erewok in the chat, a |
For a bit of background, the Also, as @florimondmanca mentioned above, @via If you have an easy opportunity to do so, you may consider experimentally running the branch from PR #1017 (branch: |
@via Starlette version 0.13.7 has been released to address this issue. |
@erewok thank you for the information! We have been using a similar change (queue size of 3), and were going to try the PR you referenced. We will just update and can report back. Thank you for the speedy responses. |
Due to issues with unevaluated responses, we have elected to revert the maxsize fix (see PR #1028 ), and so I want to leave this issue open for now hopefully as guidance to anyone who encounters the same issue. A documentation update will be provided with this information soon as well. As stated in the related issue on streaming with
Unfortunately, while the Again, these problems should be absent if you avoid subclassing |
this is still not ideal, as the entire response will be loaded into memory. This is a problem with streaming responses. encode/starlette#1012 (comment) is actually enlightening here: > "This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (galaxyproject#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic." Which is exactly what we'd be doing with a middleware, keeping resources open for longer than necessary, which we need to avoid if we ever want to run background / async tasks. I think the solution here what we already had, path operation dependencies.
this is still not ideal, as the entire response will be loaded into memory. This is a problem with streaming responses. encode/starlette#1012 (comment) is actually enlightening here: > "This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (galaxyproject#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic." Which is exactly what we'd be doing with a middleware, keeping resources open for longer than necessary, which we need to avoid if we ever want to run background / async tasks. I think the solution here what we already had, path operation dependencies.
I believe this should be fixed by #1157. We now use an AnyIO stream in BaseHTTPMiddleware that is not unbounded: starlette/starlette/middleware/base.py Line 26 in 7ed2890
We've also avoided the problem described in #1022 (context). |
* Prepare version 0.15.0 * Remember to add a note about websocket_connect * Add date and blurb to release notes * Bump version to 0.15.0 * Add note about fixing #1012
After a lot of debugging, I dug out, that streaming wasn't working because of BaseHTTPMiddleware, which it seems collects all the stream into memory and then returns it all at once with the response. That means, if you stream a lot of data, request will not give any answer until all data is collected into memory. This can take time and can result in read timeout or can simply run out of memory. Streaming was the main reason, why I chose Starlette, and one and most important thing didn't worked. Not good. But at least I found how to fix it. Related issues: encode/starlette#1012 encode/starlette#472
We've been running into memory issues when providing very large async generators to a streaming response. We have these generators producing large (larger than memory set) responses in a way that allows us to only keep small chunks in memory at a time. However, it looks like the BaseHTTPMiddleware implementation uses an asyncio queue to store the individual chunks:
https://github.com/encode/starlette/blob/master/starlette/middleware/base.py#L30
This prevents any network backpressure handling -- if the client that is receiving the streaming response is on a slow connection, the queue will happily grow without bound and consume all memory, triggering kernel out-of-memory, when the ideal handling here would be for send to block (yield) when this happens. I believe this would naturally happen if there were no queue here at all, so I am wondering why it needs to be here?
Would a PR to remove the queueing be accepted?
If not, what is the appropriate way to override this to not use a queue? We can write our own, but the use of BaseHTTPMiddleware is hardcoded:
starlette/starlette/applications.py
Line 197 in 519f575
The text was updated successfully, but these errors were encountered: