Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure SDKs for Python should support custom urllib3 connection pool size. #12102

Closed
ctstone opened this issue Jun 17, 2020 · 4 comments
Closed
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@ctstone
Copy link
Member

ctstone commented Jun 17, 2020

Is your feature request related to a problem? Please describe.
When I run the Storage SDK against a single account using a ThreadPool, multiprocessing.pool.ThreadPool, I see frequent warnings like urllib3.connectionpool:HttpConnectionPool is full, discarding connection. Urllib3 has a well-defined default connection pool max size of 10. When my ThreadPool is larger than the default, the warnings are frequent, probably correlating with an increase in round trip time.

Describe the solution you'd like
The ideal change would mean I can easily configure the underlying connection pool size. Maybe other parameters too, like number of pools.

Describe alternatives you've considered
I can eliminate the warnings on a 32-count pool using this workaround that patches the urllib3 pool parameters.

Additional context

Repro:

import logging
from azure.storage.blob import ContainerClient
from multiprocessing.pool import ThreadPool
from os import getenv

# Warnings only show up in logs
logging.basicConfig(
    format='%(asctime)s - %(levelname)s [%(name)s] %(message)s',
    level=logging.INFO)
logging.getLogger('azure').setLevel(logging.WARNING)

if __name__ == '__main__':
    connection_string = getenv('AZURE_STORAGE_CONNECTION_STRING')
    container_name = getenv('AZURE_STORAGE_CONTAINER_NAME')
    thread_count = 32 # anything >10 should trigger a warning
    num_blobs = 1000
    client = ContainerClient.from_connection_string(
        connection_string,
        container_name)
    pool = ThreadPool(thread_count)

    def upload(i: int):
        name = str(i)
        client.upload_blob(name, data='', overwrite=True)

    pool.map(upload, range(num_blobs))
@ghost ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jun 17, 2020
@lmazuel lmazuel added Azure.Core Client This issue points to a problem in the data-plane of the library. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 17, 2020
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jun 17, 2020
@lmazuel lmazuel self-assigned this Jun 17, 2020
@lmazuel
Copy link
Member

lmazuel commented Jun 17, 2020

Hi @ctstone

You should be able to pass your own session:

import requests
sess = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
sess.mount('https://', adapter)
client = ContainerClient.from_connection_string(
    connection_string,
    container_name,
    session=sess
)

Could you try and let me know? I will try to get some time to try it too, but can't tell when precisely. Thanks!

@ctstone
Copy link
Member Author

ctstone commented Jun 18, 2020

Thanks @lmazuel, this works great!

adapter = HTTPAdapter(pool_maxsize=thread_count)

However, session is not a documented parameter on the ContainerClient. Guessing it gets passed down to some lower level Azure.Core handler...

Might I suggest adding it to all client doc strings?

@lmazuel
Copy link
Member

lmazuel commented Jun 18, 2020

@ctstone yes, the entire set of supported kwargs is not obvious, there is a generic list here with most of them:
https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/core/azure-core/CLIENT_LIBRARY_DEVELOPER.md#available-policies

And this one was missing, so I created another issue:
#12121

That makes the doc complete, but not discoverable, so created another issue:
#12122

Shall we close this one? Additional thoughts?

@ctstone
Copy link
Member Author

ctstone commented Jun 18, 2020

Closing this issue since it's fixed in the provided solution. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

2 participants