Drop `rfc3986` requirement. #2252

tomchristie · 2022-05-31T12:17:24Z

This pull request drops the rfc3986 package dependancy, in favour of a carefully worked through urlparse implementation, that provides everything we need in terms of URL validation and normalisation.

Closes #1833 (Fixed)
Closes #2169 (No longer required)
Closes #2175 (No longer required)

Marking this up as ready-for-review now.

Although this adds some additional code, it also completely removes a dependancy, and is lower overall-complexity. In terms of reading through and being able to understand the split between the URL model and the underlying URL parsing code, I think it ends up actually being much simpler.

I think the rfc3986 package is fantastic, but we were having to do a few bits in some slightly kludgy ways to use it, and without the indirection I find it much clearer to work all the way through what's actually going on here.

Would be very happy to work through a review here step by step until we're confident that:

We've ensured that the parsing here is (relatively) easy enough for someone else to work through.
We've ensured that it's throughly enough source-documented.
We've ensured that any concerns about confidence-in-correctness have been thoroughly addressed.

Potentially contentious, because because of the "rely on existing packages" argument, but I'd like to work through this with someone else. To my eyes it's actually a nice rationalisation. But then I've spent the time working on it, so?

tomchristie · 2022-05-31T13:11:00Z

tests/models/test_url.py

-    assert url.query == original_query
-    assert url.fragment == original_fragment
+    with pytest.raises(httpx.InvalidURL):
+        httpx.URL("https://u:p@[invalid!]//evilHost/path?t=w#tw")


Our test here is much simpler, since this URL no longer passes validation. 👍

tomchristie · 2022-05-31T13:11:34Z

tests/models/test_url.py

-    assert url.query == original_query
-    assert url.fragment == original_fragment
+    with pytest.raises(httpx.InvalidURL):
+        url.copy_with(scheme=bad)


Similarly, this scheme no longer validates, which is an improved behaviour.

tomchristie · 2022-05-31T13:14:58Z

tests/test_asgi.py

@@ -116,7 +116,7 @@ async def test_asgi_raw_path():
        response = await client.get(url)

    assert response.status_code == 200
-    assert response.json() == {"raw_path": "/user%40example.org"}
+    assert response.json() == {"raw_path": "/user@example.org"}


This test changes, because of some improved behaviour. "@" should not be an auto-escaping character in the path.

Try https://www.example.com/some@path in a browser, or see RFC sec 3.3...

From https://datatracker.ietf.org/doc/html/rfc3986.html#section-3.3...

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

tomchristie · 2022-06-01T13:01:22Z

httpx/_urls.py

    @property
    def scheme(self) -> str:
        """
        The URL scheme, such as "http", "https".
        Always normalised to lowercase.
        """
-        return self._uri_reference.scheme or ""
+        return self._uri_reference.scheme


Unlike with rfc3986, this value can no longer be None.
It's not needed, since an empty string is sufficient.

See also other cases below.

tomchristie · 2022-06-01T13:15:17Z

httpx/_urls.py

-        if new_url.is_absolute_url:
-            new_url._uri_reference = new_url._uri_reference.normalize()
-        return URL(new_url)
+        return URL(self, **kwargs)


Note that our parameter checking moves into __init__(...) instead.

tomchristie · 2022-06-01T13:16:04Z

httpx/_urls.py

-        base_uri = self._uri_reference.copy_with(fragment=None)
-        relative_url = URL(url)
-        return URL(relative_url._uri_reference.resolve_with(base_uri).unsplit())
+        return URL(urljoin(str(self), str(URL(url))))


We're just leaning on the stdlib's built-in implementation of urljoin now, but making sure to use our URL validation and normalisation first.

tomchristie · 2022-06-01T13:18:49Z

httpx/_urls.py

+                    message = f"Argument {key!r} must be {expected} but got {seen}"
+                    raise TypeError(message)
+                if isinstance(value, bytes):
+                    kwargs[key] = value.decode("ascii")


Our urlparse implementation uses strings everywhere. If bytes are provided, then coerce to an ascii string.

This is internal detail, but there are some interesting public API considerations that this work has prompted, tho going to leave those as follow-up.

tomchristie · 2022-06-01T13:19:32Z

httpx/_urls.py

+                # than `kwargs["query"] = ""`, so that generated URLs do not
+                # include an empty trailing "?".
+                params = kwargs.pop("params")
+                kwargs["query"] = None if not params else str(QueryParams(params))


The "params" argument isn't used but the urlparse implementation, because the QueryParams model doesn't exist at that level of abstraction.

tomchristie · 2022-06-24T10:22:37Z

I might end up putting on my admin hat and pushing this one through.
Anyone interested is welcome to review it.

tomchristie · 2022-12-09T10:43:24Z

Up to date with master.

kloczek · 2022-12-09T10:59:44Z

Up to date with master.

Are you sure?

[tkloczko@devel-g2v httpx]$ git pull
Already up to date.
[tkloczko@devel-g2v httpx]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
[tkloczko@devel-g2v httpx]$ wget https://github.com/encode/httpx//pull/2252.patch#/python-httpx-Drop-rfc3986-requirement.patch
--2022-12-09 10:57:02--  https://github.com/encode/httpx//pull/2252.patch
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch [following]
--2022-12-09 10:57:02--  https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.121.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘2252.patch’

2252.patch                                     [ <=>                                                                                     ] 105.41K  --.-KB/s    in 0.05s

2022-12-09 10:57:02 (2.09 MB/s) - ‘2252.patch’ saved [107935]

[tkloczko@devel-g2v httpx]$ patch -p1 < 2252.patch
patching file httpx/_models.py
Reversed (or previously applied) patch detected!  Assume -R? [n]

tomchristie · 2022-12-09T11:08:27Z

GitHub's happy enough...

We're blocked on an approving review.

kloczek

I cannot commit that PR on top of current master however if github all checks are passing ..

tomchristie · 2022-12-09T11:35:24Z

paging @encode/maintainers .

rafalp · 2022-12-10T19:00:18Z

@tomchristie you've got it

My only comment is that perhaps we could rename copy_with with replace like how dataclasses do it in stdlib, but this is not something I think should prevent merge.

tomchristie · 2022-12-12T11:41:47Z

Great thank you.
Let's aim to get a new minor release out first, and then merge this.

kloczek

LGTM.

kloczek

Looks like PR is now out of sync 🤔

+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
2 out of 2 hunks FAILED -- saving rejects to file httpx/_models.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_types.py.rej
2 out of 3 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/client/test_proxies.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 10 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 4 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file setup.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_urls.py.rej

kloczek

LGTM

kloczek · 2023-01-02T12:16:10Z

Looks like PR is again out of date 😞

kloczek · 2023-01-03T20:44:45Z

Is it anytging else still on outstandung list related to this PR? 🤔

Secrus · 2023-01-08T22:25:52Z

is there any blocker stopping this from getting merged?

tomchristie · 2023-01-09T14:14:51Z

Looks ready to go from my side.
There's no breaking public API changes here, but since it removes a dependency I'd consider it as committing us to a proper version bump.

I've opened a discussion re. the next release here... #2534

fix #2252

tomchristie added 11 commits May 23, 2022 11:49

Drop RawURL

97debc9

First pass at adding urlparse

c975ab9

Update urlparse

8bd5de9

Add urlparse

d38e113

Add urlparse

8636a78

Merge branch 'master' into add-urlparse

91b719f

Merge master

a9a4650

Unicode non-printables can be valid in IDNA hostnames

02d6593

Update _urlparse.py docstring

a9da21f

Linting

36a8d8c

Trim away ununsed codepaths

f0b79b3

tomchristie commented May 31, 2022

View reviewed changes

tomchristie added 3 commits May 31, 2022 16:30

Tweaks for path validation depending on scheme and authority presence

31231a1

Minor cleanups

f9d3ce6

Minor cleanups

2351dd8

tomchristie commented Jun 1, 2022

View reviewed changes

tomchristie added 4 commits June 1, 2022 14:21

full_path -> raw_path, forr internal consistency

cedfd9c

Linting fixes

1b4801d

Drop rfc3986 dependency

2e0ec53

Add test for #1833

f3d596b

tomchristie marked this pull request as ready for review June 1, 2022 13:56

tomchristie requested a review from a team June 1, 2022 14:17

Merge branch 'master' into add-urlparse

c747b7e

Linting

6dd270f

kloczek approved these changes Dec 9, 2022

View reviewed changes

rafalp approved these changes Dec 10, 2022

View reviewed changes

Merge branch 'master' into add-urlparse

0dff34f

Merge branch 'master' into add-urlparse

5bba81c

kloczek approved these changes Dec 27, 2022

View reviewed changes

kloczek suggested changes Dec 27, 2022

View reviewed changes

tomchristie added 2 commits December 30, 2022 09:57

Merge master

a927670

Drop 'rfc3986' dependancy from README and docs homepage

ed1c5e0

kloczek approved these changes Dec 30, 2022

View reviewed changes

tomchristie added 2 commits January 2, 2023 12:31

Merge branch 'master' into add-urlparse

1384ffb

Merge branch 'master' into add-urlparse

dcc7869

Merge branch 'master' into add-urlparse

5b35547

Merge branch 'master' into add-urlparse

4081406

tomchristie merged commit 57daabf into master Jan 10, 2023

tomchristie deleted the add-urlparse branch January 10, 2023 10:36

paulschreiber mentioned this pull request Feb 6, 2023

Remove rfc3986 from Chinese readme #2576

Merged

Kludex pushed a commit that referenced this pull request Feb 6, 2023

Remove rfc3986 from Chinese readme (#2576)

5747764

fix #2252

tomchristie mentioned this pull request Apr 6, 2023

Version 0.24.0 #2652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop `rfc3986` requirement. #2252

Drop `rfc3986` requirement. #2252

tomchristie commented May 31, 2022 •

edited

Loading

tomchristie May 31, 2022

tomchristie May 31, 2022

tomchristie May 31, 2022

tomchristie Jun 1, 2022 •

edited

Loading

tomchristie Jun 1, 2022

tomchristie Jun 1, 2022 •

edited

Loading

tomchristie Jun 1, 2022

tomchristie Jun 1, 2022

tomchristie commented Jun 24, 2022

tomchristie commented Dec 9, 2022

kloczek commented Dec 9, 2022

tomchristie commented Dec 9, 2022

kloczek left a comment

tomchristie commented Dec 9, 2022

rafalp commented Dec 10, 2022 •

edited

Loading

tomchristie commented Dec 12, 2022

kloczek left a comment

kloczek left a comment

kloczek left a comment

kloczek commented Jan 2, 2023

kloczek commented Jan 3, 2023

Secrus commented Jan 8, 2023

tomchristie commented Jan 9, 2023

Drop rfc3986 requirement. #2252

Drop rfc3986 requirement. #2252

Conversation

tomchristie commented May 31, 2022 • edited Loading

tomchristie May 31, 2022

Choose a reason for hiding this comment

tomchristie May 31, 2022

Choose a reason for hiding this comment

tomchristie May 31, 2022

Choose a reason for hiding this comment

tomchristie Jun 1, 2022 • edited Loading

Choose a reason for hiding this comment

tomchristie Jun 1, 2022

Choose a reason for hiding this comment

tomchristie Jun 1, 2022 • edited Loading

Choose a reason for hiding this comment

tomchristie Jun 1, 2022

Choose a reason for hiding this comment

tomchristie Jun 1, 2022

Choose a reason for hiding this comment

tomchristie commented Jun 24, 2022

tomchristie commented Dec 9, 2022

kloczek commented Dec 9, 2022

tomchristie commented Dec 9, 2022

kloczek left a comment

Choose a reason for hiding this comment

tomchristie commented Dec 9, 2022

rafalp commented Dec 10, 2022 • edited Loading

tomchristie commented Dec 12, 2022

kloczek left a comment

Choose a reason for hiding this comment

kloczek left a comment

Choose a reason for hiding this comment

kloczek left a comment

Choose a reason for hiding this comment

kloczek commented Jan 2, 2023

kloczek commented Jan 3, 2023

Secrus commented Jan 8, 2023

tomchristie commented Jan 9, 2023

Drop `rfc3986` requirement. #2252

Drop `rfc3986` requirement. #2252

tomchristie commented May 31, 2022 •

edited

Loading

tomchristie Jun 1, 2022 •

edited

Loading

tomchristie Jun 1, 2022 •

edited

Loading

rafalp commented Dec 10, 2022 •

edited

Loading