-
-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop rfc3986
requirement.
#2252
Conversation
assert url.query == original_query | ||
assert url.fragment == original_fragment | ||
with pytest.raises(httpx.InvalidURL): | ||
httpx.URL("https://u:p@[invalid!]//evilHost/path?t=w#tw") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our test here is much simpler, since this URL no longer passes validation. 👍
assert url.query == original_query | ||
assert url.fragment == original_fragment | ||
with pytest.raises(httpx.InvalidURL): | ||
url.copy_with(scheme=bad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this scheme no longer validates, which is an improved behaviour.
@@ -116,7 +116,7 @@ async def test_asgi_raw_path(): | |||
response = await client.get(url) | |||
|
|||
assert response.status_code == 200 | |||
assert response.json() == {"raw_path": "/user%40example.org"} | |||
assert response.json() == {"raw_path": "/user@example.org"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test changes, because of some improved behaviour. "@" should not be an auto-escaping character in the path.
Try https://www.example.com/some@path
in a browser, or see RFC sec 3.3...
From https://datatracker.ietf.org/doc/html/rfc3986.html#section-3.3...
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
@property | ||
def scheme(self) -> str: | ||
""" | ||
The URL scheme, such as "http", "https". | ||
Always normalised to lowercase. | ||
""" | ||
return self._uri_reference.scheme or "" | ||
return self._uri_reference.scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike with rfc3986
, this value can no longer be None
.
It's not needed, since an empty string is sufficient.
See also other cases below.
if new_url.is_absolute_url: | ||
new_url._uri_reference = new_url._uri_reference.normalize() | ||
return URL(new_url) | ||
return URL(self, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that our parameter checking moves into __init__(...)
instead.
base_uri = self._uri_reference.copy_with(fragment=None) | ||
relative_url = URL(url) | ||
return URL(relative_url._uri_reference.resolve_with(base_uri).unsplit()) | ||
return URL(urljoin(str(self), str(URL(url)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just leaning on the stdlib's built-in implementation of urljoin
now, but making sure to use our URL validation and normalisation first.
message = f"Argument {key!r} must be {expected} but got {seen}" | ||
raise TypeError(message) | ||
if isinstance(value, bytes): | ||
kwargs[key] = value.decode("ascii") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our urlparse
implementation uses strings everywhere. If bytes
are provided, then coerce to an ascii string.
This is internal detail, but there are some interesting public API considerations that this work has prompted, tho going to leave those as follow-up.
# than `kwargs["query"] = ""`, so that generated URLs do not | ||
# include an empty trailing "?". | ||
params = kwargs.pop("params") | ||
kwargs["query"] = None if not params else str(QueryParams(params)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "params" argument isn't used but the urlparse
implementation, because the QueryParams
model doesn't exist at that level of abstraction.
I might end up putting on my admin hat and pushing this one through. |
Up to date with master. |
Are you sure? [tkloczko@devel-g2v httpx]$ git pull
Already up to date.
[tkloczko@devel-g2v httpx]$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
[tkloczko@devel-g2v httpx]$ wget https://github.com/encode/httpx//pull/2252.patch#/python-httpx-Drop-rfc3986-requirement.patch
--2022-12-09 10:57:02-- https://github.com/encode/httpx//pull/2252.patch
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch [following]
--2022-12-09 10:57:02-- https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.121.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘2252.patch’
2252.patch [ <=> ] 105.41K --.-KB/s in 0.05s
2022-12-09 10:57:02 (2.09 MB/s) - ‘2252.patch’ saved [107935]
[tkloczko@devel-g2v httpx]$ patch -p1 < 2252.patch
patching file httpx/_models.py
Reversed (or previously applied) patch detected! Assume -R? [n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot commit that PR on top of current master however if github all checks are passing ..
paging @encode/maintainers . |
@tomchristie you've got it My only comment is that perhaps we could rename |
Great thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like PR is now out of sync 🤔
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
2 out of 2 hunks FAILED -- saving rejects to file httpx/_models.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_types.py.rej
2 out of 3 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/client/test_proxies.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 10 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 4 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file setup.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_urls.py.rej
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks like PR is again out of date 😞 |
Is it anytging else still on outstandung list related to this PR? 🤔 |
is there any blocker stopping this from getting merged? |
Looks ready to go from my side. I've opened a discussion re. the next release here... #2534 |
This pull request drops the
rfc3986
package dependancy, in favour of a carefully worked throughurlparse
implementation, that provides everything we need in terms of URL validation and normalisation.Closes #1833 (Fixed)
Closes #2169 (No longer required)
Closes #2175 (No longer required)
Marking this up as ready-for-review now.
Although this adds some additional code, it also completely removes a dependancy, and is lower overall-complexity. In terms of reading through and being able to understand the split between the
URL
model and the underlying URL parsing code, I think it ends up actually being much simpler.I think the
rfc3986
package is fantastic, but we were having to do a few bits in some slightly kludgy ways to use it, and without the indirection I find it much clearer to work all the way through what's actually going on here.Would be very happy to work through a review here step by step until we're confident that:
Potentially contentious, because because of the "rely on existing packages" argument, but I'd like to work through this with someone else. To my eyes it's actually a nice rationalisation. But then I've spent the time working on it, so?