Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not remove newline and tab characters #419

Closed
demurgos opened this issue Oct 16, 2018 · 11 comments
Closed

Do not remove newline and tab characters #419

demurgos opened this issue Oct 16, 2018 · 11 comments
Labels
needs tests Moving the issue forward requires someone to write tests topic: api topic: parser

Comments

@demurgos
Copy link

demurgos commented Oct 16, 2018

The current spec requires to produce a validation error if a newline or tab is encountered, and remove them from the input:

url/url.bs

Lines 1472 to 1474 in 49060c7

<li><p>If <var>input</var> contains any <a>ASCII tab or newline</a>, <a>validation error</a>.
<li><p>Remove all <a>ASCII tab or newline</a> from <var>input</var>.

This contradicts the behavior observed in Firefox, Chrome and Edge:

const url = new URL("http://www.example.com/")
url.pathname = 'foo\nbar';
url.href
  • Node and WHATWG URL: "http://www.example.com/foobar"
  • Firefox, Chrome and Edge: "http://www.example.com/foo%0Abar"
const url = new URL("http://www.example.com/")
url.pathname = 'foo\tbar';
url.href
  • Node and WHATWG URL: "http://www.example.com/foobar"
  • Firefox, Chrome and Edge: "http://www.example.com/foo%09bar"

This issue was originally reported for Node: nodejs/node#23696
Removing these characters prevents the use of file:// urls to represent files with newlines and tabs in their path.

@devsnek
Copy link

devsnek commented Oct 16, 2018

Safari also follows the spec here.

@annevk annevk added topic: parser topic: api needs tests Moving the issue forward requires someone to write tests labels Oct 17, 2018
@annevk
Copy link
Member

annevk commented Oct 17, 2018

You could still construct a URL with %09 or %0A, no?

However, browsers being divergent here does is sufficient to reconsider how the API should behave. We'd need somewhat exhaustive tests for all members (not just pathname's setter) to ensure browsers are consistent here.

@demurgos
Copy link
Author

demurgos commented Oct 17, 2018

Yes, there is probably a way to fix the Node issue without waiting for this issue to be resolved. Using encodeURI before assigning to the pathname seems to preserve newlines and tabs.

@jasnell
Copy link
Collaborator

jasnell commented Oct 17, 2018

+1 on going with the temporary workaround for now, but I think it makes sense to follow the browser's behavior on this, at least in the setters. It is interesting to note that Chrome (I haven't checked the others) behaves per-spec with regards to the initial constructor...

const u = new URL('https://example.org/a\nb');
console.log(u.href);  // https://example.org/ab

u.pathname = 'a\nb';
console.log(u.href);  // https://example.org/a%0Ab

Which is just a tad frustrating in it's inconsistency ;-)

@guybedford
Copy link

@annevk do you think it would be realistic to get browsers to adopt the spec by opening up bugs for this? Wondering if we can get any feedback from browser vendors here on these cases to know their constraints.

@annevk
Copy link
Member

annevk commented Nov 20, 2018

Given that most browsers are not in line with the standard (at least that's what OP suggests, but exhaustive tests are needed as per my above comment) I think this can change. @achristensen07 can comment on behalf of Safari, but I suspect they'd want to align with the others on this.

@achristensen07
Copy link
Collaborator

I oppose changing the spec here. Our setters use the same parsers as the constructor, and for elegance and performance reasons I would like to keep it that way. I also believe the reason why we have the tab-and-newline-removing logic in the constructor applies to setters as well. I have seen no internal bug reports indicating real compatibility problems from this. I would appreciate if Chrome, Firefox, and Edge aligned with the specification, Node, and Safari.

@valenting
Copy link
Collaborator

Speaking for Firefox, I think we ought to align with the spec, as it really does make sense.
But this is not a big priority right now, as the majority of other UAs have this behaviour right now. We've had small webcompat bugs with similar changes before, but I don't expect there would be any here.

@demurgos
Copy link
Author

demurgos commented Nov 22, 2018

I also believe the reason why we have the tab-and-newline-removing logic in the constructor applies to setters as well.

What is the reason behind the removal of those characters?

I would like URLs to have those two properties:

  • Use the same rules both in the constructor and in the setters.
  • Support an equivalence between system paths and file:// URLs: all unique valid paths should have a corresponding unique valid URL. This is important to me as it reduces edge cases: currently the spec silently drops some characters. It has caused me some bugs because different file paths led to the same file URL. It also means that some files are may be unreachable using the URL constructor (or you need to special case).

Having those two properties would imply that the constructor should be changed to no longer remove these characters. (Instead of removing the characters even in the setters)

@annevk
Copy link
Member

annevk commented Apr 27, 2020

@demurgos I think you have to percent-encode system paths either way as the encoding is not always clear. They cannot be passed directly to a URL constructor.

@annevk
Copy link
Member

annevk commented Apr 27, 2020

@achristensen07 I wrote tests for this in web-platform-tests/wpt#23265 of which Safari fails quite a few (some related to stripping, some related to other bugs). Looking at url/url-setters.html Safari has many failures there too.

annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 28, 2020
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue May 5, 2020
…tters, a=testonly

Automatic update from web-platform-tests
URL: stripping of tab and newlines in setters

Closes whatwg/url#419.
--

wpt-commits: 04aec1fa1b1f5cf8833b61682da707b7fd47bef6
wpt-pr: 23265
xeonchen pushed a commit to xeonchen/gecko that referenced this issue May 5, 2020
…tters, a=testonly

Automatic update from web-platform-tests
URL: stripping of tab and newlines in setters

Closes whatwg/url#419.
--

wpt-commits: 04aec1fa1b1f5cf8833b61682da707b7fd47bef6
wpt-pr: 23265
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue May 6, 2020
…tters, a=testonly

Automatic update from web-platform-tests
URL: stripping of tab and newlines in setters

Closes whatwg/url#419.
--

wpt-commits: 04aec1fa1b1f5cf8833b61682da707b7fd47bef6
wpt-pr: 23265

UltraBlame original commit: 4f721ebd4db736cb0f87059675e217194088b753
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue May 6, 2020
…tters, a=testonly

Automatic update from web-platform-tests
URL: stripping of tab and newlines in setters

Closes whatwg/url#419.
--

wpt-commits: 04aec1fa1b1f5cf8833b61682da707b7fd47bef6
wpt-pr: 23265

UltraBlame original commit: 4f721ebd4db736cb0f87059675e217194088b753
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue May 6, 2020
…tters, a=testonly

Automatic update from web-platform-tests
URL: stripping of tab and newlines in setters

Closes whatwg/url#419.
--

wpt-commits: 04aec1fa1b1f5cf8833b61682da707b7fd47bef6
wpt-pr: 23265

UltraBlame original commit: 4f721ebd4db736cb0f87059675e217194088b753
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Moving the issue forward requires someone to write tests topic: api topic: parser
Development

No branches or pull requests

7 participants