URL.host returns Punycode instead of Unicode for some URLs #3333
Unanswered
loic-bellinger
asked this question in
General
Replies: 1 comment 1 reply
-
I don't think the documentation currently say very much at all about the subtleties of the URL parameters, or if the user should expect the (Yes it should - there's quite an involved set of documentation work around the details here) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description
The URL.host property does not decode IDNA hostnames into Unicode, which contradicts the specification. According to the httpx documentation, the host should always be returned as a string, normalized to lowercase, with IDNA hosts decoded into Unicode.
Step to reproduce
Expected behavior
The URL.host property should return the Unicode version of the host, in this case: www.égalité-femmes-hommes.gouv.fr.
Actual behavior
The URL.host property returns the Punycode-encoded version of the host: www.xn--galit-femmes-hommes-9ybf.gouv.fr.
Potential fix
It seems the issue arises in this part of the httpx code:
The use of
startswith("xn--")
checks only for Punycode-encoded hosts that begin with this prefix. However, it should handle cases where IDNA encoding is used more comprehensively.Replacing
host.startswith("xn--")
with something like if"xn--" in host
might handle a broader set of cases?Environment
httpx version: 0.27.2
Python version: 3.12.x
OS: Linux/Windows
Beta Was this translation helpful? Give feedback.
All reactions