Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host parsing UTF-8 decoding can fail #215

Closed
achristensen07 opened this issue Jan 23, 2017 · 4 comments
Closed

Host parsing UTF-8 decoding can fail #215

achristensen07 opened this issue Jan 23, 2017 · 4 comments

Comments

@achristensen07
Copy link
Collaborator

Host parsing step 2 currently reads like this:
"Let domain be the result of UTF-8 decode without BOM on the percent decoding of UTF-8 encode on input."
If the UTF-8 decoding fails, it should explicitly say to return failure. This matches Chrome, Firefox, and WebKit after https://bugs.webkit.org/show_bug.cgi?id=167330

hubot pushed a commit to WebKit/WebKit-http that referenced this issue Jan 24, 2017
https://bugs.webkit.org/show_bug.cgi?id=167330
Source/WebCore:

<rdar://problem/29319962>

Reviewed by Tim Horton.

Covered by new API tests.

* platform/URLParser.cpp:
(WebCore::containsOnlyASCII):
(WebCore::URLParser::parseHostAndPort):
If UTF-8 decoding fails after percent-decoding the host, fail to parse.
This matches Chrome and Firefox, and it was proposed to the spec in whatwg/url#215

Tools:


Reviewed by Tim Horton.

* TestWebKitAPI/Tests/WebCore/URLParser.cpp:
(TestWebKitAPI::TEST_F):



git-svn-id: http://svn.webkit.org/repository/webkit/trunk@211067 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@annevk
Copy link
Member

annevk commented Jan 24, 2017

Did you add tests on WPT? Also, from what I can tell the patch you uploaded there you are not returning failure since you still manage to extract a scheme, host, patch, etc. from the URL. Or am I misunderstanding the C++ tests?

@achristensen07
Copy link
Collaborator Author

I didn't add tests on WPT yet, but the two tests from my patch, "http://example.com%A0" and "http://%E2%98%83", would be great to add.
My test format is kind of confusing. The first row with empty values (and the original string for things like the anchor tag that need the original invalid URL) indicates that parsing failed with "http://example.com%A0", which now makes WebKit match Chrome and Firefox. The second row shows WebKit's old behavior, which would just make the A lowercase. Same with "http://%E2%98%83" parsing now to the punycode-encoded Unicode snowman, "xn--n3h", where old WebKit's behavior would just leave it as "%e2%98%83"

@annevk
Copy link
Member

annevk commented Jan 24, 2017

Thanks. So we should use https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail instead and propagate the failure. That should be a rather trivial fix.

@annevk
Copy link
Member

annevk commented Jan 24, 2017

(If you could reply to the other URL issues before you go to sleep that would be great, but I can understand if you want to wait until tomorrow.)

annevk added a commit that referenced this issue Jan 24, 2017
annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 24, 2017
domenic pushed a commit to web-platform-tests/wpt that referenced this issue Jan 24, 2017
annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants