Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information exposure bug: response leaking to wrong requests in uvicorn-based server under heavy load #645

Open
MarkusSintonen opened this issue Nov 20, 2024 · 6 comments

Comments

@MarkusSintonen
Copy link

MarkusSintonen commented Nov 20, 2024

  • uvloop version: 0.21.0
  • Python version: 3.12.6
  • Platform: uname_result(system='Linux', node='xxx', release='6.1.109', version='#1 SMP PREEMPT_DYNAMIC Thu Sep 19 22:28:35 UTC 2024', machine='x86_64')
  • Can you reproduce the bug with PYTHONASYNCIODEBUG in env?: No
  • Does uvloop behave differently from vanilla asyncio? How?: Yes, see below

We encountered a very nasty data leakage bug with uvloop. It is leaking responses to incorrect HTTP requests in a uvicorn based server under a heavy load. This caused some users data to leak into requests of other users leading to a incorrect information exposure.

This issue is caused solely by uvloop as removal of it fixed the issue. Relying on vanilla asyncio does not have the same issue.

The issue happened under high load situations. The service processes some 800 million requests per day but 100 requests in a day had the wrong responses from other concurrently happening requests. It seemed to happen in a situations when there is a higher load. Also the issue sometimes correlated with other issues we experienced with uvloop:

Sometimes we observed these strange and bad looking RuntimeErrors coming from depths of uvloop at about same time as we saw the incorrect responses coming from requests. But this did not happen always in correlation. (The above errors also got fixed by removal of uvloop).

I haven't been able to reproduce this as it seems to only happen under heavy load situations and rarely enough (but still bad to leak information).

It seems uvloop might have some major issues in its TCP/socket/stream implementation that it tries to some times use incorrect already used socket like the RuntimeErrors would hint. Not sure is it actually related to the data leakage issue.

@Kludex
Copy link

Kludex commented Nov 20, 2024

Is this duplicated of #631 ?

@MarkusSintonen
Copy link
Author

Is this duplicated of #631 ?

I dont know, that seems to be about handling multiple requests for the same path the response took longer than expected to arrive at the client.. So something about performance. But it could be related.

@MarkusSintonen
Copy link
Author

I hope eg uvicorn would drop usage of uvloop when doing pip install 'uvicorn[standard]'. It seems uvloop has quite a big amount issues currently in its connection handling implementation (the mentioned RuntimeError bugs etc). Also the perf claims it has in README are questionable (#566).

@todddialpad
Copy link

I have been experiencing failures when under heavy load. This started with the aiohttp/aiohappyeyeballs changes in that projects 3.10.0 release. After some deep-diving I think the key change was that project's switch from allowing loop.create_connection to create and connect the underlying socket/file descriptor, to manually creating and connecting the socket and then passing the sock argument to loop.create_connection.

I'm guessing there is a problem with socket ownership, especially when an exception happens during the loop.create_connection call. I'm not completely sure of the exact code path, but to test the idea, I've tried the following patch

diff --git a/uvloop/loop.pyx b/uvloop/loop.pyx
index f9a5a23..9797684 100644
--- a/uvloop/loop.pyx
+++ b/uvloop/loop.pyx
@@ -1877,6 +1877,7 @@ cdef class Loop:
             AddrInfo ai_local = None
             AddrInfo ai_remote
             TCPTransport tr
+            int sockfd
 
             system.addrinfo *rai = NULL
             system.addrinfo *lai = NULL
@@ -2060,8 +2061,9 @@ cdef class Loop:
             waiter = self._new_future()
             tr = TCPTransport.new(self, protocol, None, waiter, context)
             try:
+                sockfd = sock.detach()
                 # libuv will make socket non-blocking
-                tr._open(sock.fileno())
+                tr._open(sockfd)
                 tr._init_protocol()
                 await waiter
             except (KeyboardInterrupt, SystemExit):
@@ -2075,8 +2077,6 @@ cdef class Loop:
                 tr._close()
                 raise
 
-            tr._attach_fileobj(sock)
-
         if ssl:
             app_transport = protocol._get_app_transport(context)
             try:

and the failures have gone away.

@1st1
Copy link
Member

1st1 commented Nov 25, 2024

Interesting. We'll get this PR merged likely in a week. Thanks for the thorough investigation and good issue full of details.

@jakob-keller
Copy link

jakob-keller commented Dec 22, 2024

FWIW: I suspect we got bit by the same underlying issue in a production system under heavy load. uvloop was introduced recently and is used in conjuncture with aiobotocore (built on top of botocore and aiohttp) to perform Amazon SageMaker Real-Time Endpoint invokations whilst maintaining connections to multiple Valkey clusters via valkey[libvalkey] (async API used).

Eventually, it looks like there is inconsistency with regards to transports / file descriptor reuse?

botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: File descriptor 91 is used by transport <TCPTransport closed=False reading=True 0xaaab009cc140>

The issue does not occur in our staging environment which is iso-production except for the load being handled.

Reverting to asyncio.run fully fixes the issue at hand.

The environment is containerized (AWS Fargate for arm64) with CPython 3.12.8 and all relevant dependencies being up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants