-
-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError on path remove_dot_segments when there's extra dot-dot (/../
) segments
#536
Comments
Interesting. A pull request is welcome! |
So, here's the relevant pieces from the RFC: And: And the algorithm under section 5.2.2. Transform References. I think what we need to pay attention to is the difference between a "Reference URL" and a "Target URL". A Target URL is expected to be resolved and normalized, meaning that it's expected to not have any From that perspective, I believe the input to We could also consider comparing the rules with https://url.spec.whatwg.org/ , which seem to be more close to the common "URL Library" implementations. Anyways, since we do desire the compatibility, as suggested above, I'm going to get it to work as suggestion. |
The last revision in the Webkit LayoutTests repository with such a file is 149680; the next rev folded the If you were looking at |
I've converted those tests from JSON to a small Python script: from yarl import URL
from urllib.parse import unquote, quote
cases = [
("/././foo", "/foo"),
("/./.foo", "/.foo"),
("/foo/.", "/foo/"),
("/foo/./", "/foo/"),
("/foo/bar/..", "/foo/"),
("/foo/bar/../", "/foo/"),
("/foo/..bar", "/foo/..bar"),
("/foo/bar/../ton", "/foo/ton"),
("/foo/bar/../ton/../../a", "/a"),
("/foo/../../..", "/"),
("/foo/../../../ton", "/ton"),
("/foo/%2e", "/foo/"),
("/foo/%2e%2", "/foo/%2e%2"),
("/foo/%2e./%2e%2e/.%2e/%2e.bar", "/%2e.bar"),
("////../..", "//"),
("/foo/bar//../..", "/foo/"),
("/foo/bar//..", "/foo/bar/"),
("/foo", "/foo"),
("/%20foo", "/%20foo"),
("/foo%", "/foo%"),
("/foo%2", "/foo%2"),
("/foo%2zbar", "/foo%2zbar"),
("/foo%2©zbar", "/foo%2%C3%82%C2%A9zbar"),
("/foo%41%7a", "/foo%41%7a"),
("/foo\t\u0091%91", "/foo%C2%91%91"),
("/foo%00%51", "/foo%00%51"),
("/(%28:%3A%29)", "/(%28:%3A%29)"),
("/%3A%3a%3C%3c", "/%3A%3a%3C%3c"),
("/foo\tbar", "/foobar"),
("\\\\foo\\\\bar", "//foo//bar"),
("/%7Ffp3%3Eju%3Dduvgw%3Dd", "/%7Ffp3%3Eju%3Dduvgw%3Dd"),
("/@asdf%40", "/@asdf%40"),
("/你好你好", "/%E4%BD%A0%E5%A5%BD%E4%BD%A0%E5%A5%BD"),
("/‥/foo", "/%E2%80%A5/foo"),
("//foo", "/%EF%BB%BF/foo"),
("/\u202e/foo/\u202d/bar", "/%E2%80%AE/foo/%E2%80%AD/bar"),
]
requote = False
for i, (test, expected) in enumerate(cases, 1):
if requote:
expected = quote(
unquote(expected, encoding="latin1"), safe="/@:()=", encoding="latin1"
)
try:
actual = URL(f"http://example.com{test}").raw_path
except ValueError as e:
actual = f"exception: {e}"
if actual != expected:
print(f"({i}) \x1b[31mFAIL\x1b[0m: {test}")
print(f" {actual}\n != {expected}") and this outputs:
Several of these are not actual failures; That leaves 11, 14 and 30:
I'm not so sure about turning backslashes into forward slashes nor am I inclined to research that specific issue right now. The following alteration to the for seg in segments:
if seg not in (".", ".."):
resolved_path.append(seg)
elif seg == ".." and (len(resolved_path) > 1 or resolved_path[-1]):
resolved_path.pop()
# seg == "." |
Thanks for working on this, @mjpieters! I generally agree with the direction here. Specifically, I think not-raising-an-exception is the most-important problem to fix here, as it causes processing of user-provided data to fail in a place where no such errors are expected. |
Here's an alternative version, one avoids the (slower) prefix, resolved_path = "", []
if path.startswith("/"):
# preserve the "/" root element of absolute paths.
prefix = "/"
path = path[1:]
segments = path.split("/")
for seg in segments:
if seg == "..":
# ignore any .. segments that would otherwise cause an
# IndexError when popped from resolved_path if
# resolving for rfc3986
with suppress(IndexError):
resolved_path.pop()
elif seg != ".":
resolved_path.append(seg)
if segments and segments[-1] in (".", ".."):
# do some post-processing here.
# if the last segment was a relative dir,
# then we need to append the trailing '/'
resolved_path.append("")
return prefix + "/".join(resolved_path) |
🐞 Describe the bug
As described in RFC 3986 § 5.2.4. Remove Dot Segments, the
remove_dot_segments
algorithm removes any extra/../
parts of the URL, ignoring errors when the stack is empty.However,
yarl.URL()
behavior at the moment is to raise an exception,ValueError
, when that happens.💡 To Reproduce
Instantiate URL class with such a URL string:
💡 Expected behavior
Follow the RFC, and ignore the error, to get it working like this:
We probably want to also test (and fix, if needed), any other parts of the API that would involve path resolution (including
remove_dot_segments
) steps.📋 Logs/tracebacks
📋 Your version of the Python
📋 Your version of the aiohttp/yarl/multidict distributions
📋 Additional context
The proposed behavior would put
yarl.URL
on par with DOM'sURL
object, which already follows the RFC on this:As well as
rust-url
library (from the unit tests):which has tests based on (warning: broken links):
The text was updated successfully, but these errors were encountered: