Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The get_environment_proxies function in _utils.py does not support IPv4, IPv6 correctly #2660

Closed
HearyShen opened this issue Apr 14, 2023 · 0 comments · Fixed by #2659
Closed

Comments

@HearyShen
Copy link
Contributor

HearyShen commented Apr 14, 2023

Hi, I encountered error when my environment no_proxy includes IPv6 address like ::1. It is wrongly transformed into all://*::1 and causes urlparse error since the _urlparse.py parses the :1 as port.

image

The get_environment_proxies function in _utils.py is responsible for parsing and mounting proxy info from system environment.

httpx/httpx/_utils.py

Lines 229 to 264 in 4b5a92e

def get_environment_proxies() -> typing.Dict[str, typing.Optional[str]]:
"""Gets proxy information from the environment"""
# urllib.request.getproxies() falls back on System
# Registry and Config for proxies on Windows and macOS.
# We don't want to propagate non-HTTP proxies into
# our configuration such as 'TRAVIS_APT_PROXY'.
proxy_info = getproxies()
mounts: typing.Dict[str, typing.Optional[str]] = {}
for scheme in ("http", "https", "all"):
if proxy_info.get(scheme):
hostname = proxy_info[scheme]
mounts[f"{scheme}://"] = (
hostname if "://" in hostname else f"http://{hostname}"
)
no_proxy_hosts = [host.strip() for host in proxy_info.get("no", "").split(",")]
for hostname in no_proxy_hosts:
# See https://curl.haxx.se/libcurl/c/CURLOPT_NOPROXY.html for details
# on how names in `NO_PROXY` are handled.
if hostname == "*":
# If NO_PROXY=* is used or if "*" occurs as any one of the comma
# separated hostnames, then we should just bypass any information
# from HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, and always ignore
# proxies.
return {}
elif hostname:
# NO_PROXY=.google.com is marked as "all://*.google.com,
# which disables "www.google.com" but not "google.com"
# NO_PROXY=google.com is marked as "all://*google.com,
# which disables "www.google.com" and "google.com".
# (But not "wwwgoogle.com")
mounts[f"all://*{hostname}"] = None
return mounts

For env no_proxy, according to CURLOPT_NOPROXY explained, it should support domains, IPv4, IPv6 and the localhost. However, current get_environment_proxies function implementation only supports domains correctly as it always adds wildcard * in front of the hostname.

To fix this issue, I looked into this repo and suggest handling the no_proxy hostnames as domains, IPv4, IPv6 and the localhost seperately. I have updated the get_environment_proxies function in _utils.py and tested it.

Refer to the PR: #2659

Replies and discussions are welcomed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant