Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use empty host for file URL's #260

Merged
merged 9 commits into from
Mar 7, 2017
Merged

Conversation

rmisev
Copy link
Member

@rmisev rmisev commented Feb 24, 2017

Summary of changes:

  1. Fixes Empty or null host when scheme is "file:"? #258
  2. Introduces empty host term for file and non-special URL's
  3. Changes definition: opaque host is non-empty ASCII string

Preview | Diff

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but many nits.

url.bs Outdated
<dd><p>a <a>path-absolute-non-Windows-file-URL string</a> if <a>base URL</a>'s <a for=url>host</a>
is non-null
is non-empty string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always a string though. It could be an IPv4 address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just say not an empty host.

url.bs Outdated
@@ -254,7 +254,7 @@ point <a for=/>URLs</a> from <var>A</var> can come from untrusted sources.
<h3 id=host-representation>Host representation</h3>

<p>A <dfn export id=concept-host>host</dfn> is a <a>domain</a>, an
<a>IPv4 address</a>, an <a>IPv6 address</a>, or an <a>opaque host</a>. Typically a <a for=/>host</a>
<a>IPv4 address</a>, an <a>IPv6 address</a>, an <a>opaque host</a>, or an <a>empty host</a>. Typically a <a for=/>host</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't meet the 100 column wrapping requirements.

url.bs Outdated
@@ -280,12 +280,14 @@ eight <dfn id=concept-ipv6-piece lt='IPv6 piece'>16-bit pieces</dfn>.
<p class="note">Support for <code>&lt;zone_id></code> is
<a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2">intentionally omitted</a>.

<p>An <dfn export>opaque host</dfn> is an <a>ASCII string</a> holding data that can be used for
<p>An <dfn export>opaque host</dfn> is a non-empty <a>ASCII string</a> holding data that can be used for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think.

url.bs Outdated
@@ -768,7 +770,7 @@ no purpose other than being a location the algorithm can jump to.
<a>IPv6 serializer</a> on <var>host</var>,
followed by "<code>]</code>".

<li><p>Otherwise, <var>host</var> is a <a>domain</a> or <a>opaque host</a>, return <var>host</var>.
<li><p>Otherwise, <var>host</var> is a <a>domain</a>, an <a>opaque host</a>, or an <a>empty host</a>, return <var>host</var>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

url.bs Outdated
@@ -1172,9 +1174,9 @@ switching on <a>base URL</a>'s <a for=url>scheme</a>:
<dd><p>a <a>path-relative-scheme-less-URL string</a>
<dt>"<code>file</code>"
<dd><p>a <a>scheme-relative-file-URL string</a>
<dd><p>a <a>path-absolute-URL string</a> if <a>base URL</a>'s <a for=url>host</a> is null
<dd><p>a <a>path-absolute-URL string</a> if <a>base URL</a>'s <a for=url>host</a> is either null or the empty string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "an empty host" I think. A base URL is a URL record after all. But also, I thought the idea was that for file URLs the host could never be null so then we should leave that out too.

url.bs Outdated
@@ -1199,7 +1201,7 @@ optionally followed by a <a>path-absolute-URL string</a>.
<a>path-absolute-URL string</a>.

<p>An <dfn export>opaque-host-and-port string</dfn> must be either an empty
<a>valid opaque-host string</a> or: a non-empty <a>valid opaque-host string</a>, optionally followed
string or: a <a>valid opaque-host string</a>, optionally followed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "the empty string" (and an empty host).

url.bs Outdated
@@ -2066,10 +2068,10 @@ string <var>input</var>, optionally with a <a>base URL</a> <var>base</var>, opti
<a>Windows drive letter</a>, then:

<ol>
<li><p>If <var>url</var>'s <a for=url>host</a> is non-null,
<li><p>If <var>url</var>'s <a for=url>host</a> is not the empty string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again we're talking about a URL record so we should say "not an empty host". Same below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both an empty host and an opaque host are strings by definition. So maybe it's fine to set host to the empty string in the parser? For example the opaque host parser returns string, but not an opaque host record.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I guess some level of implicit casting is probably okay.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I'm wondering about is if we should add some kind of explanatory text. That file URLs can have empty or non-opaque hosts, non-file-special URLs can have non-empty-non-opaque hosts, and non-special URLs can have empty, opaque, or null hosts. Such a note should probably go where we define the host field of URLs.

url.bs Outdated
@@ -383,7 +385,7 @@ up to three <a>ASCII digits</a> per sequence, each representing a decimal number

XXX should we define the format inline instead just like STD 66? -->

<p>An <dfn export>valid opaque-host string</dfn> must be zero or more <a>URL units</a> or:
<p>An <dfn export>valid opaque-host string</dfn> must be one or more <a>URL units</a> or:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change "An" to "A" here while we're changing this? I can also do it before merging.

url.bs Outdated
@@ -768,7 +770,8 @@ no purpose other than being a location the algorithm can jump to.
<a>IPv6 serializer</a> on <var>host</var>,
followed by "<code>]</code>".

<li><p>Otherwise, <var>host</var> is a <a>domain</a> or <a>opaque host</a>, return <var>host</var>.
<li><p>Otherwise, <var>host</var> is a <a>domain</a>, an <a>opaque host</a>, or an <a>empty
host</a>, return <var>host</var>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it reads more naturally if you leave out the "an". So "A domain, opaque host, or empty host".

url.bs Outdated
@@ -1172,9 +1175,10 @@ switching on <a>base URL</a>'s <a for=url>scheme</a>:
<dd><p>a <a>path-relative-scheme-less-URL string</a>
<dt>"<code>file</code>"
<dd><p>a <a>scheme-relative-file-URL string</a>
<dd><p>a <a>path-absolute-URL string</a> if <a>base URL</a>'s <a for=url>host</a> is null
<dd><p>a <a>path-absolute-URL string</a> if <a>base URL</a>'s <a for=url>host</a> is an <a>empty
host</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newlines inside phrasing/inline elements.

url.bs Outdated
<a>validation error</a>.

<li><p>Set <var>url</var>'s <a for=url>host</a> to null and replace the second
<li><p>Set <var>url</var>'s <a for=url>host</a> to the empty string and replace the second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is >100 columns. "second" needs to go to the next line.

@rmisev
Copy link
Member Author

rmisev commented Feb 27, 2017

Maybe explanatory text is good idea, I don't know. But I would like to note, what file URL still can have null host (two invalid URL's cases: file:path and file:/path).

Also I think about such note below the empty host definition: "An empty host is used by non-special and file URLs."

@annevk
Copy link
Member

annevk commented Feb 27, 2017

I guess there's no real point trying to make those invalid cases consistent. And you're right about where the note should go. Although maybe we should have it in both places eventually, once we're certain this design can be widely implemented (i.e., once two browsers ship it).

@rmisev
Copy link
Member Author

rmisev commented Mar 2, 2017

I think the such table is clearer than a long explanatory text #260 (review), #260 (comment):

Scheme domain IPv4 address IPv6 address opaque host empty host null
non-file special + + +
"file" + + + + +
non-special + + + +

Then maybe no need for additional notes. What do you think?

@annevk
Copy link
Member

annevk commented Mar 3, 2017

And have that below the definition of URL's host? I think that's reasonable.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits left. I'll also push a commit to adjust the formatting of the table a bit.

url.bs Outdated
<p>An <dfn export>opaque host</dfn> is an <a>ASCII string</a> holding data that can be used for
further processing.
<p>An <dfn export>opaque host</dfn> is a non-empty <a>ASCII string</a> holding data that can be used
for further processing.

<p class="note no-backref">An <a>opaque host</a> is only used by <a lt="is special">non-special</a>
<a for=/>URLs</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either we should have a note like this for empty hosts too or we should drop this one, no?

url.bs Outdated
<th><a>opaque host</a>
<th><a>empty host</a>
<th>null
<tr><td>non-file <a lt="special scheme">special</a><td>✅<td>✅<td>✅<td><td><td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-"<code>file</code>"

@annevk
Copy link
Member

annevk commented Mar 6, 2017

I ended up fixing the nits, let me know what you think.

if host is null, then no validation error and don't set host to ""
@rmisev
Copy link
Member Author

rmisev commented Mar 6, 2017

I think your changes are OK.

Yet I have changed the "Windows drive letter quirk" in the path state, to be more logical (no validation error if host null) and consistent (null host is left the null):

For example, now: file:/C:/p --> host === null as for file:/a/p also host === null, but file://example.net/C:/ --> host === ""

Now LGTM.

@annevk
Copy link
Member

annevk commented Mar 6, 2017

Looks okay, blocked on speced/bikeshed#941 now.

@annevk annevk merged commit 5807b28 into whatwg:master Mar 7, 2017
@domenic
Copy link
Member

domenic commented Mar 7, 2017

To confirm, this only impacts validation, not parsing/serializing?

@annevk
Copy link
Member

annevk commented Mar 7, 2017

Yeah, this should not have changed either of those (other than logging validation errors).

@rmisev
Copy link
Member Author

rmisev commented Mar 7, 2017

Yes, fixes validation bug. Parsing: file://host/C:/ and file:///C:/ now parses to empty host, previously to null. Serializing for file URL is the same host is null or empty - so no impact.

domenic added a commit to jsdom/whatwg-url that referenced this pull request Mar 14, 2017
Follows whatwg/url#260. This does not impact parsing, only validation and the URL record representation.
@domenic
Copy link
Member

domenic commented Mar 14, 2017

Note that there is no test coverage for the case introduced here: in

        if (this.url.host !== "" && this.url.host !== null) {
          this.parseError = true;
          this.url.host = "";
        }

the condition never evaluates to true so those two lines are never reached. Maybe a coverage gap, maybe a spec bug.

@rmisev
Copy link
Member Author

rmisev commented Mar 14, 2017

@domenic test case, which covers this code is in previous comment: #260 (comment):
file://host/C:/

It looks like WPT lacks this. I can prepare PR.

domenic pushed a commit to web-platform-tests/wpt that referenced this pull request Mar 14, 2017
TimothyGu added a commit to TimothyGu/node that referenced this pull request Apr 23, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: nodejs#12523
Fixes: nodejs#10608
Fixes: nodejs#10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
TimothyGu added a commit to nodejs/node that referenced this pull request Apr 24, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request Apr 25, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request May 1, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
evanlucas pushed a commit to nodejs/node that referenced this pull request May 2, 2017
- Update to spec
  - Add opaque hosts
  - File state did not correctly deal with lack of base URL
  - Cleanup API for file and non-special URLs
  - Allow % and IPv6 addresses in non-special URL hosts
  - Use specific names for percent-encode sets
  - Add empty host concept for file and non-special URLs
  - Clarify IPv6 serializer
- Fix existing mistakes
  - Add missing ':' to forbidden host code point list.
  - Correct IPv4 parser empty label behavior
- Maintain type equivalence in URLContext with spec
  - scheme, username, and password should always be strings
  - host, port, query, and fragment may be strings or null
  - Align scheme state more closely with the spec
  - Make sure the `special` variable is always synced with
    URL_FLAG_SPECIAL.

PR-URL: #12523
Fixes: #10608
Fixes: #10634
Refs: whatwg/url#185
Refs: whatwg/url#225
Refs: whatwg/url#224
Refs: whatwg/url#218
Refs: whatwg/url#243
Refs: whatwg/url#260
Refs: whatwg/url#268
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
rmisev added a commit to upa-url/upa that referenced this pull request May 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants