Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

url.parse does not handle IPv6 link-local addresses correctly #9404

Closed
cthayer opened this issue Mar 12, 2015 · 18 comments
Closed

url.parse does not handle IPv6 link-local addresses correctly #9404

cthayer opened this issue Mar 12, 2015 · 18 comments

Comments

@cthayer
Copy link

cthayer commented Mar 12, 2015

In version 0.12.0, url.parse() does not parse a URL containing an IPv6 link-local address correctly.

Specifically, adding a zone to the address in the host portion of the URL causes url.parse() to fail to detect the host portion of the address and considers everything after the protocol to be part of the path.

Example of failure with IPv6 zone:

var url = require('url');

url.parse('ws://[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket');
{ protocol: 'ws:',
  slashes: true,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: '?EIO=3&transport=websocket',
  query: 'EIO=3&transport=websocket',
  pathname: '/[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/',
  path: '/[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket',
  href: 'ws:///[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket' }

Example of success without the IPv6 zone:

var url = require('url');

url.parse('ws://[fe80::2e0:4cff:fe68:23ff]:51877/engine.io/?EIO=3&transport=websocket');
{ protocol: 'ws:',
  slashes: true,
  auth: null,
  host: '[fe80::2e0:4cff:fe68:23ff]:51877',
  port: '51877',
  hostname: 'fe80::2e0:4cff:fe68:23ff',
  hash: null,
  search: '?EIO=3&transport=websocket',
  query: 'EIO=3&transport=websocket',
  pathname: '/engine.io/',
  path: '/engine.io/?EIO=3&transport=websocket',
  href: 'ws://[fe80::2e0:4cff:fe68:23ff]:51877/engine.io/?EIO=3&transport=websocket' }

Expected behavior

I would expect url.parse() of ws://[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket to produce the following result:

{ protocol: 'ws:',
  slashes: true,
  auth: null,
  host: '[fe80::2e0:4cff:fe68:23ff%en7]:51877',
  port: 51877,
  hostname: 'fe80::2e0:4cff:fe68:23ff%en7',
  hash: null,
  search: '?EIO=3&transport=websocket',
  query: 'EIO=3&transport=websocket',
  pathname: '/engine.io/',
  path: '/engine.io/?EIO=3&transport=websocket',
  href: 'ws://[fe80::2e0:4cff:fe68:23ff%en7]:51877/engine.io/?EIO=3&transport=websocket' }
@cthayer
Copy link
Author

cthayer commented Mar 12, 2015

Looking through the code, it seems like the IPv6 hostname detection may need to be done before the invalid host character detection so that different invalid host characters can be used for IPv6 hostnames.

@cthayer
Copy link
Author

cthayer commented Mar 12, 2015

According to that reasoning, I would imagine changing lib/url.js like so:

diff --git a/lib/url.js b/lib/url.js
index c5a3793..e696547 100644
--- a/lib/url.js
+++ b/lib/url.js
@@ -70,6 +70,7 @@ var protocolPattern = /^([a-z0-9.+-]+:)/i,
     // are the ones that are *expected* to be seen, so we fast-path
     // them.
     nonHostChars = ['%', '/', '?', ';', '#'].concat(autoEscape),
+    nonHostCharsIPv6 = ['/', '?', ';', '#'].concat(autoEscape),
     hostEndingChars = ['/', '?', '#'],
     hostnameMaxLen = 255,
     hostnamePartPattern = /^[+a-z0-9A-Z_-]{0,63}$/,
@@ -216,11 +217,19 @@ Url.prototype.parse = function(url, parseQueryString, slashesDenoteHost) {
       rest = rest.slice(atSign + 1);
       this.auth = decodeURIComponent(auth);
     }
+    
+    // if hostname begins with [ and is followed by ]
+    // assume that it's an IPv6 address.
+    // :: is the smallest IPv6 address possible, so ] must be > 2 positions to the right of [
+    var ipv6Hostname = rest[0] === '[' &&
+        rest.indexOf(']') > 2;

     // the host is the remaining to the left of the first non-host char
+    // use the appropriate non-host char set
+    var invalidHostChars = ipv6Hostname ? nonHostCharsIPv6 : nonHostChars;
     hostEnd = -1;
-    for (var i = 0; i < nonHostChars.length; i++) {
-      var hec = rest.indexOf(nonHostChars[i]);
+    for (var i = 0; i < invalidHostChars.length; i++) {
+      var hec = rest.indexOf(invalidHostChars[i]);
       if (hec !== -1 && (hostEnd === -1 || hec < hostEnd))
         hostEnd = hec;
     }
@@ -238,11 +247,6 @@ Url.prototype.parse = function(url, parseQueryString, slashesDenoteHost) {
     // so even if it's empty, it has to be present.
     this.hostname = this.hostname || '';

-    // if hostname begins with [ and ends with ]
-    // assume that it's an IPv6 address.
-    var ipv6Hostname = this.hostname[0] === '[' &&
-        this.hostname[this.hostname.length - 1] === ']';
-
     // validate a little.
     if (!ipv6Hostname) {
       var hostparts = this.hostname.split(/\./);

Full code is here

I can create a pull request if that will be helpful.

@misterdjules
Copy link

@cthayer Thank you for reporting and investigating this issue!

Adding to milestone 0.12.2, because:

  • It's a bug that is present in v0.12.0 and v0.10.x, so probably not urgent to fix.
  • We already have a lot of issues to fix in v0.12.1 that are more critical in my opinion.

I'm not familiar with IPv6 addresses' format, but please submit a PR with your change, it'll make it much easier to discuss it.

Thank you again! 👍

@cthayer
Copy link
Author

cthayer commented Mar 14, 2015

@misterdjules No problem. Happy to be able to help out.

Submitted pull request #9411 with my proposed changes and relevant tests.

Your reasoning on the importance of the bug fix makes sense, though of course I would love to see this fixed in 0.12.1 since it's making it hard to use npm modules in a project I'm working on, but I will defer to your work flow. Currently, I'm using the sockets directly to avoid the use of url.parse that many modules use.

In IPv6, link-local addresses can contain a zone that specifies the interface that the address belongs to. The zone is in the hostname portion of a URL and is separated from the address by a % character (which is not a valid host character in IPv4).

@misterdjules
Copy link

@cthayer Thanks for the clarification! My goal this week is to prioritize issues in the 0.12.1 milestone to get it out in a reasonable amount of time, so hopefully we'll get to v0.12.2 sooner rather than later!

@cthayer
Copy link
Author

cthayer commented Mar 16, 2015

@misterdjules Sounds good. Looking forward to the 0.12.2 release!

@misterdjules
Copy link

@cthayer Because of the recent vulnerabilities in OpenSSL, we had to do a v0.12.1 release with just security fixes, so all v0.12.x releases have been shifted. As a result, what I called v0.12.2 in my previous comment is now v0.12.3.

@cthayer
Copy link
Author

cthayer commented Mar 24, 2015

@misterdjules Understood. Thanks for keeping me updated!

@misterdjules misterdjules modified the milestones: 0.12.4, 0.12.3 Apr 1, 2015
@obastemur
Copy link

This looks like a zone-identifier support instead of a bug. % represents the zone-identifier and lacking a wide range browser support. Nonetheless It's a very good catch.

@cthayer this patch tries to bypass zone-id or add a support? In case of supporting zone identifiers, it would be nice to have a 'zoneId' etc. parameter on url.parse's return value.

@cthayer
Copy link
Author

cthayer commented Apr 20, 2015

@obastemur this patch is adding support for zone-id by allowing the % character within an IPv6 address in the host section of a URL. In some ways I can see how my implementation may seem like a bypass. Since node.js already supports the zone-id in IPv6 addresses, my primary goal was to prevent url.parse from failing/mangling a URL that node.Socket would consider valid. That said, it would definitely be nice to have a 'zoneId' parameter on url.parse's return value, but my primary goal is to fix the incorrect parsing of IPv6 link-local addresses (addresses that have zone ids) rather than change the structure of the return value from url.parse

Happy to take a stab at adding the zoneId parameter if that's what everyone thinks is best.

@obastemur
Copy link

'%' is a special character that is used for escaping the special chars. url.parse shouldn't accept or try to parse a mistakenly escaped address. (this can be from an old device not knowing what zone-id is) The important point here is that this exception is an addition to previous spec. that '%' starts at the end of the address definition.

IMHO if url.parse is going to handle zone-id (according to spec.), it should better put it into a parameter. I'm not sure which naming is best for it though. ('zoneId' etc.)

@cthayer
Copy link
Author

cthayer commented Apr 21, 2015

I felt my implementation was safe (spec wise) because it only allows the '%' character in the host portion if url.parse detects that the host portion of the URL is an IPv6 address (as determined by the square brackets). Which url.parse was already doing.

In an IPv6 address, special character escaping is not allowed because special characters are not valid characters in an IPv6 address so the percent is safe there. Also probably why it was chosen as the separator in the IPv6 spec.

I understand not wanting to break backwards compatibility, but I don't think my implementation is safe there.

That said, I'm open to whatever is determined to be the best implementation.

I'm all for naming the parameter according to spec (i.e. zoneId)

@misterdjules
Copy link

Removing from the 0.12.4 milestone according to my comment in the corresponding PR.

@nahuel
Copy link

nahuel commented Jul 11, 2016

Note, nodejs/node-eps#28 proposes to implement the WHATWG URL standard, but in the latest standard revision including the zoneid in the URL is discarded: https://url.spec.whatwg.org/#hosts-%28domains-and-ip-addresses%29

Anyways, it will be nice to pass the zoneid as an option to url.parse, because for link-local address you really need it. But I think the problem must be resolved at a lower level, with something like http://potaroo.net/ietf/idref/draft-kitamura-ipv6-zoneid-free/

@MasterJames
Copy link

MasterJames commented Apr 14, 2017

also affecting theturtle32 websocket package.
theturtle32/WebSocket-Node#278
NodeJS 7.4.0 (BTW rgw newer version 7.8.0 doesn't debug with chrome anymore.)
Looks like it could be % percent sign issue still in current versions.

@aqrln
Copy link

aqrln commented Apr 14, 2017

@MasterJames if the problem is still present in currently supported versions of Node.js, please open a relevant issue at https://github.com/nodejs/node/. This repo is only for archiving purposes and no issues should be reported here. Thanks!

@MasterJames
Copy link

Yes thanks @aqrln just linking it in now.
nodejs/node#12410

@gibfahn
Copy link
Member

gibfahn commented Apr 15, 2017

I think this can be closed in favour of nodejs/node#12410, let me know if that's wrong.

@gibfahn gibfahn closed this as completed Apr 15, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants