Memory usage streaming large responses #1012

via · 2020-07-27T17:22:52Z

We've been running into memory issues when providing very large async generators to a streaming response. We have these generators producing large (larger than memory set) responses in a way that allows us to only keep small chunks in memory at a time. However, it looks like the BaseHTTPMiddleware implementation uses an asyncio queue to store the individual chunks:

https://github.com/encode/starlette/blob/master/starlette/middleware/base.py#L30

This prevents any network backpressure handling -- if the client that is receiving the streaming response is on a slow connection, the queue will happily grow without bound and consume all memory, triggering kernel out-of-memory, when the ideal handling here would be for send to block (yield) when this happens. I believe this would naturally happen if there were no queue here at all, so I am wondering why it needs to be here?

Would a PR to remove the queueing be accepted?

If not, what is the appropriate way to override this to not use a queue? We can write our own, but the use of BaseHTTPMiddleware is hardcoded:

starlette/starlette/applications.py

Line 197 in 519f575

self.add_middleware(BaseHTTPMiddleware, dispatch=func)

, leaving only some fairly hacky approaches to preventing this queueing.

euri10 · 2020-07-27T18:42:07Z

I kind of remember it has been discussed already here, not sure it's relevant to your case though #802

…

On Mon, Jul 27, 2020, 7:23 PM via ***@***.***> wrote: We've been running into memory issues when providing very large async generators to a streaming response. We have these generators producing large (larger than memory set) responses in a way that allows us to only keep small chunks in memory at a time. However, it looks like the BaseHTTPMiddleware implementation uses an asyncio queue to store the individual chunks: https://github.com/encode/starlette/blob/master/starlette/middleware/base.py#L30 This prevents any network backpressure handling -- if the client that is receiving the streaming response is on a slow connection, the queue will happily grow without bound and consume all memory, triggering kernel out-of-memory, when the ideal handling here would be for send to block (yield) when this happens. I believe this would naturally happen if there were no queue here at all, so I am wondering why it needs to be here? Would a PR to remove the queueing be accepted? If not, what is the appropriate way to override this to not use a queue? We can write our own, but the use of BaseHTTPMiddleware is hardcoded: https://github.com/encode/starlette/blob/519f5750b5e797bb3d4805fd29657674304ce397/starlette/applications.py#L197, leaving only some fairly hacky approaches to preventing this queueing. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1012>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAINSPT7UOTUWBXAWKM352TR5WZX5ANCNFSM4PJAVLSA> .

via · 2020-07-27T18:51:21Z

I did see that ticket, but I do think its important to distinguish between the high level backpressure logic that I think that ticket is mostly about (specifically managing concurrent request limits, etc), and network backpressure. I do not believe that ticket is correct when it suggests that uvicorn/gnuicorn manages the low-level network socket backpressure, as the use of the asyncio.Queue there is completely in starlette, and will happily absorb the entire response payload until it dies.

florimondmanca · 2020-07-31T08:25:33Z

I think queueing is fine, but the queue is unbounded which results in losing the backpressure properties provided by the server. (As you said, it’s allowed to grow forever even if the other side isn’t consuming chunks fast enough.)

In my mind setting a maxsize of eg 10 would resolve the memory issues to you’re seeing, because then if the client isn’t consuming chunks fast enough the app will block on send() until there’s room in the queue.

Edit: actually as discussed by @erewok in the chat, a maxsize of 1 should be just enough. This way we stick to the backpressure behavior of the server as close as we can, and I don't think there's a risk of deadlock. In any case, any fix would be worth validating on a real-life big upload too. :-)

erewok · 2020-08-01T17:14:31Z

For a bit of background, the BaseHTTPMiddleware offers a way for users to write middleware that takes a request and builds a response. This is a bit closer to the request-handling logic than other middleware, but in order to assemble a response, the BaseHTTPMiddleware needs some generic way to handle any potential response the handler would produce, which is the reason for the queue and streaming.

Also, as @florimondmanca mentioned above, @via If you have an easy opportunity to do so, you may consider experimentally running the branch from PR #1017 (branch: stream_response_bg) and seeing if your memory issues in the case of large responses are resolved. If you do find an opportunity to try it, please report back.

erewok · 2020-08-05T17:00:24Z

@via Starlette version 0.13.7 has been released to address this issue.

via · 2020-08-05T17:07:57Z

@erewok thank you for the information!

We have been using a similar change (queue size of 3), and were going to try the PR you referenced. We will just update and can report back. Thank you for the speedy responses.

erewok · 2020-08-13T12:56:34Z

Due to issues with unevaluated responses, we have elected to revert the maxsize fix (see PR #1028 ), and so I want to leave this issue open for now hopefully as guidance to anyone who encounters the same issue. A documentation update will be provided with this information soon as well.

As stated in the related issue on streaming with BaseHTTPMiddleware (#919):

We would like to discourage the use of BaseHTTPMiddleware in favor of raw ASGI middleware which includes a method def __call__(self, scope, receive, send):, and
If people still want to use this middleware class, it should never be used with StreamingResponse/FileResponse endpoints.

Unfortunately, while the BaseHTTPMiddleware class offers an easier interface because it provides a request argument and promises something like a response, this middleware class also encourages users to think about the asgi app functionality in a "complete" or "finished" way. This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic.

Again, these problems should be absent if you avoid subclassing BaseHTTPMiddleware.

this is still not ideal, as the entire response will be loaded into memory. This is a problem with streaming responses. encode/starlette#1012 (comment) is actually enlightening here: > "This means this class will either load the entirety of streaming requests into memory (this issue) and run the background before returning the response (galaxyproject#919 ), or if we fix those problems, that it will then encourage users to leave resources in a pending or open state, an arguably worse result. In short, it's problematic." Which is exactly what we'd be doing with a middleware, keeping resources open for longer than necessary, which we need to avoid if we ever want to run background / async tasks. I think the solution here what we already had, path operation dependencies.

JayH5 · 2021-06-23T08:04:46Z

I believe this should be fixed by #1157. We now use an AnyIO stream in BaseHTTPMiddleware that is not unbounded:

starlette/starlette/middleware/base.py

Line 26 in 7ed2890

send_stream, recv_stream = anyio.create_memory_object_stream()

We've also avoided the problem described in #1022 (context).

* Prepare version 0.15.0 * Remember to add a note about websocket_connect * Add date and blurb to release notes * Bump version to 0.15.0 * Add note about fixing #1012

After a lot of debugging, I dug out, that streaming wasn't working because of BaseHTTPMiddleware, which it seems collects all the stream into memory and then returns it all at once with the response. That means, if you stream a lot of data, request will not give any answer until all data is collected into memory. This can take time and can result in read timeout or can simply run out of memory. Streaming was the main reason, why I chose Starlette, and one and most important thing didn't worked. Not good. But at least I found how to fix it. Related issues: encode/starlette#1012 encode/starlette#472

florimondmanca added the bug Something isn't working label Jul 31, 2020

erewok mentioned this issue Aug 1, 2020

Streaming with BaseHTTPMiddleware: force background to run after response completes #1017

Closed

erewok mentioned this issue Aug 3, 2020

Fix high memory usage when using BaseHTTPMiddleware middleware classes and streaming responses #1018

Merged

erewok closed this as completed in #1018 Aug 4, 2020

This was referenced Aug 11, 2020

Remove maxsize arg from Queue constructor in BaseHTTPMiddleware #1028

Merged

Background tasks don't work with middleware that subclasses BaseHTTPMiddleware #919

Closed

erewok reopened this Aug 13, 2020

erewok mentioned this issue Aug 13, 2020

Improve Documentation on Writing Custom ASGI Middleware and BaseHTTPMiddleware #1029

Closed

dmig-alarstudios mentioned this issue Aug 17, 2020

rewrite middleware to pure ASGI tomwojcik/starlette-context#18

Closed

JayH5 added a commit that referenced this issue Jun 23, 2021

Add note about fixing #1012

03cbc86

JayH5 closed this as completed Jun 23, 2021

JayH5 added a commit that referenced this issue Jun 23, 2021

Prepare version 0.15.0 (#1202)

119c427

* Prepare version 0.15.0 * Remember to add a note about websocket_connect * Add date and blurb to release notes * Bump version to 0.15.0 * Add note about fixing #1012

tomwojcik mentioned this issue Jun 23, 2021

Consider removing RawContextMiddleware and keeping only ContextMiddleware after resolving memory usage issue tomwojcik/starlette-context#47

Open

gareth-leake mentioned this issue Jun 17, 2022

Globally importable request object #420

Closed

alidaw mentioned this issue Jan 31, 2024

Fetching API docs occasionally fails at starlette framework middleware issue with streaming large responses #2453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage streaming large responses #1012

Memory usage streaming large responses #1012

via commented Jul 27, 2020 •

edited

Loading

euri10 commented Jul 27, 2020 via email

via commented Jul 27, 2020

florimondmanca commented Jul 31, 2020 •

edited

Loading

erewok commented Aug 1, 2020 •

edited

Loading

erewok commented Aug 5, 2020

via commented Aug 5, 2020

erewok commented Aug 13, 2020

JayH5 commented Jun 23, 2021 •

edited

Loading

Memory usage streaming large responses #1012

Memory usage streaming large responses #1012

Comments

via commented Jul 27, 2020 • edited Loading

euri10 commented Jul 27, 2020 via email

via commented Jul 27, 2020

florimondmanca commented Jul 31, 2020 • edited Loading

erewok commented Aug 1, 2020 • edited Loading

erewok commented Aug 5, 2020

via commented Aug 5, 2020

erewok commented Aug 13, 2020

JayH5 commented Jun 23, 2021 • edited Loading

via commented Jul 27, 2020 •

edited

Loading

florimondmanca commented Jul 31, 2020 •

edited

Loading

erewok commented Aug 1, 2020 •

edited

Loading

JayH5 commented Jun 23, 2021 •

edited

Loading