-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Consider sharing a single HTTP connection pool by default #13797
Comments
Hey @mikeharder! currently a session object is shared when we instantiate a parent client - for example a BlobServiceClient as a singleton - and create instances of BlobClients, ContainerClients, etc. This happens by simply passing that session when initiating the children clients. However, I can't think of a way to have this done by default (under the hood) where one session object persists for a given python process so its used by any client created in that process (I assume this is what you meant by default behavior?) And on a finer grain, I don't believe there is a way to share a connection pool between different sessions, referring to https://requests.readthedocs.io/en/master/user/advanced/#keep-alive it states "thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!". In .NET this is done by default? or like Python, users have to explicitly pass that session object? Thank you :) |
@tasherif-msft: I believe .NET, Java, and JS all share a single "session" by default (where "session" here is a vague term meaning the clients re-use the same HTTP connection when possible). You can compare the .NET and Python sample apps here: https://github.com/mikeharder/azure-sdk-http-sharing In Python, I think this could be implemented with a shared static "default session" used if none is passed in. |
Hi @mikeharder, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support. |
For closure on this thread, current recommendation is to pass a custom transport (not a custom session): from azure.core.pipeline.transport import RequestsTransport
transport = RequestsTransport()
client3 = BlobClient.from_connection_string(connection_string, "test", "test", transport=transport)
client3.upload_blob([], overwrite=True)
client4 = BlobClient.from_connection_string(connection_string, "test", "test", transport=transport)
client4.upload_blob([], overwrite=True) |
To improve performance, Python should consider sharing a single HTTP connection pool by default, as is already done by other languages like .NET.
For example, this .NET code will use a single HTTP connection for both uploads, significantly improving the performance of the second upload:
In contrast, this Python code will use a new HTTP connection for each upload:
It's possible to reuse the connection in Python by passing a shared session, but I think this should be the default behavior:
To repro, run the apps at https://github.com/mikeharder/azure-sdk-http-sharing and use a tool like Wireshark to monitor the HTTP connections.
Another issue is that the
session
parameter is not included in the API docs, but I believe this is covered by #12122.CC: @johanste, @lmazuel
The text was updated successfully, but these errors were encountered: