Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarn 1.x.x invalid url with node 18.17.0 VS 18.16.1 #48855

Closed
FrederickEngelhardt opened this issue Jul 20, 2023 · 21 comments
Closed

yarn 1.x.x invalid url with node 18.17.0 VS 18.16.1 #48855

FrederickEngelhardt opened this issue Jul 20, 2023 · 21 comments
Labels
url Issues and PRs related to the legacy built-in url module. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.

Comments

@FrederickEngelhardt
Copy link

FrederickEngelhardt commented Jul 20, 2023

Version

18.17.0

Platform

Mac OS (aarm64 & intel)

Subsystem

No response

What steps will reproduce the bug?

Using node 18.17.0 and installing yarn 1.x.x will cause this error.

How often does it reproduce? Is there a required condition?

  1. Every time after installing 18.17.0 from 18.16.1. So far, this issue is specific to a monorepo being used.
  2. I'm having a hard time replicating it on init projects that use the npm registry. The project with the issue uses a artifactory jfrog registry which points to a couple other internal artifactory registries.

Other tips to replicating this issue:

  • If you are swapping node from a minor version make sure to reset your yarn cache to replicate the issue 100% of the time. IE yarn cache clear. Project level cache should be enough.

Example commands

nvm install 18.17.0; nvm use 18.17.0; npm i -g yarn; yarn cache clean; yarn;

What is the expected behavior? Why is that the expected behavior?

yarn should install dependencies and resolve those urls.

What do you see instead?

Error and unexpected error has occurred "Invalid Url".

Additional information

If this is an expected behavior change 18.16 - 18.17.0 (breaking yarn package manager on 1.x.x) we can close this issue.

Yarn 1.x.x is still used in many projects so this may affect a bunch of users.

Temp workarounds

  1. Stay on Node 18.16 and lock the version. IE with .nvrmc or _node-version change it from 18 to 18.16

Possibly related URL issues:

@anonrig
Copy link
Member

anonrig commented Jul 20, 2023

Can you share the URL causing this error? cc @nodejs/url

@anonrig anonrig added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Jul 20, 2023
@FrederickEngelhardt
Copy link
Author

FrederickEngelhardt commented Jul 20, 2023

 verbose 0.688693792 TypeError [ERR_INVALID_URL]: Invalid URL
     at new NodeError (node:internal/errors:405:5)
     at Url.parse (node:url:445:17)
     at urlParse (node:url:167:13)
     at GitResolver.isVersion (/Users/<path>.yarn/releases/yarn-1.22.19.js:38701:21)
     at getExoticResolver (/Users/<path>.yarn/releases/yarn-1.22.19.js:29731:18)
     at /Users/<path>.yarn/releases/yarn-1.22.19.js:38152:78
     at Generator.next (<anonymous>)
     at step (/Users/<path>.yarn/releases/yarn-1.22.19.js:310:30)
     at /Users/<path>.yarn/releases/yarn-1.22.19.js:321:13
 error An unexpected error occurred: "Invalid URL".
 info If you think this is a bug, please open a bug report with the information provided in "/Users/<path>yarn-error.log".

@aduh95 aduh95 added url Issues and PRs related to the legacy built-in url module. and removed whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Jul 20, 2023
@aduh95
Copy link
Contributor

aduh95 commented Jul 20, 2023

Looking at the stack trace, it seems to be related to the legacy Url class, not the WHATWG URL class.

The related code on Yarn source is at https://github.com/yarnpkg/yarn/blob/158d96dce95313d9a00218302631cd263877d164/src/resolvers/exotics/git-resolver.js#L38

/cc @arcanis

@aduh95
Copy link
Contributor

aduh95 commented Jul 20, 2023

The throw is located at

node/lib/url.js

Lines 444 to 446 in a39b8a2

if (this.hostname === '' || forbiddenHostChars.test(this.hostname)) {
throw new ERR_INVALID_URL(url);
}

The hostname is set at

node/lib/url.js

Line 433 in a39b8a2

this.hostname = toASCII(this.hostname, true);

And toASCII is imported from internal/idna:

const { toASCII } = require('internal/idna');

Which itself reexports it from internal/url:

const { domainToASCII, domainToUnicode } = require('internal/url');
module.exports = { toASCII: domainToASCII, toUnicode: domainToUnicode };

The list of commits that have touched internal/url since v18.16.0 is not small:
https://github.com/nodejs/node/commits/a39b8a2a3e8414c8a54650bdfe4a46998282d88a/lib/internal/url.js

/cc @nodejs/url

@anonrig anonrig added the v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch. label Jul 21, 2023
@akkunchoi
Copy link

$ node --version && node -e "console.log(require('node:url').parse('x://0.0,1.1/').host)"
v18.16.1
0.0,1.1

$ node --version && node -e "console.log(require('node:url').parse('x://0.0,1.1/').host)"
v20.5.0
0.0,1.1

$ node --version && node -e "console.log(require('node:url').parse('x://0.0,1.1/').host)"
v18.17.0
node:url:445
          throw new ERR_INVALID_URL(url);
          ^

TypeError [ERR_INVALID_URL]: Invalid URL
    at new NodeError (node:internal/errors:405:5)
    at Url.parse (node:url:445:17)
    at Object.urlParse [as parse] (node:url:167:13)
    at [eval]:1:33
    at Script.runInThisContext (node:vm:123:12)
    at Object.runInThisContext (node:vm:299:38)
    at node:internal/process/execution:79:19
    at [eval]-wrapper:6:22
    at evalScript (node:internal/process/execution:78:60)
    at node:internal/main/eval_string:28:3 {
  input: 'x://0.0,1.1/',
  code: 'ERR_INVALID_URL'
}

$ node --version && node -e "console.log(require('node:url').parse('x://0.0,1/').host)"
v18.17.0
0.0,1

$ node --version && node -e "console.log(require('node:url').parse('x://a.a.a.a,b.b.b.b/').host)" 
v18.17.0
a.a.a.a,b.b.b.b

It seems to result in an Invalid URL error under the following conditions

  • v18.17.0
  • When specifying a URL with a hostname that includes numbers, dots, and a comma

I use a comma in the hostname when specifying multiple hosts in MongoDB.
https://www.mongodb.com/docs/manual/reference/connection-string/#standard-connection-string-format

@aduh95
Copy link
Contributor

aduh95 commented Jul 21, 2023

Starting bisecting

@richardlau
Copy link
Member

I guess this is a duplicate of #48850?

@aduh95
Copy link
Contributor

aduh95 commented Jul 21, 2023

0dc485e is the first bad commit /cc @anonrig

@arcanis
Copy link
Contributor

arcanis commented Jul 21, 2023

Seems it's not a problem with Yarn itself but, out of curiosity, why is your custom registry hostname containing commas? I didn't even know this was supported 😅

@lemire
Copy link
Member

lemire commented Jul 21, 2023

The ada URL parsing library itself has no beef with commas, as you can see in the playground...

https://www.ada-url.com/playground?url=x%3A%2F%2F0.0%2C1%2F

@anonrig
Copy link
Member

anonrig commented Jul 21, 2023

The issue is with ada::idna and how it returns empty string for 0.0,1.1 input. node -e "console.log(require('url').domainToASCII('0.0,1.1'))"

@anonrig
Copy link
Member

anonrig commented Jul 21, 2023

It seems this PR is not backported, and might be the root cause of this: #47735

@richardlau
Copy link
Member

The issue is with ada::idna and how it returns empty string for 0.0,1.1 input. node -e "console.log(require('url').domainToASCII('0.0,1.1'))"

I'm not sure that is the same problem. node -p "console.log(require('url').domainToASCII('0.0,1.1'))" returns an empty string in Node.js 16.18.1 as well.

@richardlau
Copy link
Member

richardlau commented Jul 21, 2023

The issue is with ada::idna and how it returns empty string for 0.0,1.1 input. node -e "console.log(require('url').domainToASCII('0.0,1.1'))"

I'm not sure that is the same problem. node -p "console.log(require('url').domainToASCII('0.0,1.1'))" returns an empty string in Node.js 16.18.1 as well.

FWIW I've bisected Node.js 16 and the behaviour of

node -e "console.log(require('url').domainToASCII('0.0,1.1'))"

changed in Node.js 16.17.0 with 8cda415 (#43190).

It looks like #43190 made it into 18.6.0, which is consistent with:

$ nvm run 18.5.0 -e "console.log(require('url').domainToASCII('0.0,1.1'))"
Running node v18.5.0 (npm v8.12.1)
0.0,1.1
$ nvm run 18.6.0 -e "console.log(require('url').domainToASCII('0.0,1.1'))"
Running node v18.6.0 (npm v8.13.2)

$

So while it does look like the behaviour of node -e "console.log(require('url').domainToASCII('0.0,1.1'))" did regress, it happened way before Node.js 18.17.0 so it is unlikely to be the test to capture whatever the regression that happened in Node.js 18.17.0 is.

nodejs-github-bot pushed a commit that referenced this issue Jul 22, 2023
PR-URL: #48878
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
nodejs-github-bot pushed a commit that referenced this issue Jul 22, 2023
PR-URL: #48878
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
anonrig added a commit to anonrig/node that referenced this issue Jul 22, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
anonrig added a commit to anonrig/node that referenced this issue Jul 22, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
@FrederickEngelhardt
Copy link
Author

I believe the issue lies with yarn packages using invalid install urls ie the package does not exist.

The problematic code relates to an internal monorepo packages using dependencies that import one of the internal package versions that is a lower version.

Previous in 18.16.1 it would not throw a error when attempting to install the invalid package.

Still trying to create a replicable repo outside the internal code. Guessing I'll need a lerna monorepoa and shadow some package names and make one of the packages install a invalid dependency.

@FrederickEngelhardt
Copy link
Author

FrederickEngelhardt commented Jul 22, 2023

Seems it's not a problem with Yarn itself but, out of curiosity, why is your custom registry hostname containing commas? I didn't even know this was supported

The custom registry does not contain commas. It's an artifactory registry. There is only one registry used, and always-auth=true is enabled for other registries. - required for yarn v1 to work even though it's deprecated with npm.

Ceres6 pushed a commit to Ceres6/node that referenced this issue Jul 27, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Jul 27, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
@everett1992
Copy link
Contributor

everett1992 commented Aug 1, 2023

I've noticed a similar issue with npm 6 calling url.parse("npm:html-webpack-plugin@4.0.0-beta.8")

» asdf shell nodejs 18.16.0
» node -p 'url.parse("npm:html-webpack-plugin@4.0.0-beta.8")'
Url { ... }

» asdf shell nodejs 18.17.0
» node -p 'url.parse("npm:html-webpack-plugin@4.0.0-beta.8")'
node:url:445
          throw new ERR_INVALID_URL(url);
          ^

TypeError [ERR_INVALID_URL]: Invalid URL
    at new NodeError (node:internal/errors:405:5)
    at Url.parse (node:url:445:17)
    at Object.urlParse [as parse] (node:url:167:13)
    at [eval]:1:5
    at Script.runInThisContext (node:vm:123:12)
    at Object.runInThisContext (node:vm:299:38)
    at node:internal/process/execution:79:19
    at [eval]-wrapper:6:22
    at evalScript (node:internal/process/execution:78:60)
    at node:internal/main/eval_string:28:3 {
  input: 'npm:html-webpack-plugin@4.0.0-beta.8',
  code: 'ERR_INVALID_URL'
}

Node.js v18.17.0

In this case .8 causes the error, url.parse("npm:pkg") is valid while url.parse("npm:pkg.0") is an error. new URL("npm:pkg.0") is also not an error.

@xen0n
Copy link

xen0n commented Aug 2, 2023

We've hit this very bug and done some investigation:

For a string such as npm:html-webpack-plugin@4.0.0-beta.8, eventually we reach this block (from lib/url.js:426):

        // IDNA Support: Returns a punycoded representation of "domain".
        // It only converts parts of the domain name that
        // have non-ASCII characters, i.e. it doesn't matter if
        // you call it with a domain that already is ASCII-only.

        // Use lenient mode (`true`) to try to support even non-compliant
        // URLs.
        this.hostname = toASCII(this.hostname, true);

        // Prevent two potential routes of hostname spoofing.
        // 1. If this.hostname is empty, it must have become empty due to toASCII
        //    since we checked this.hostname above.
        // 2. If any of forbiddenHostChars appears in this.hostname, it must have
        //    also gotten in due to toASCII. This is since getHostname would have
        //    filtered them out otherwise.
        // Rather than trying to correct this by moving the non-host part into
        // the pathname as we've done in getHostname, throw an exception to
        // convey the severity of this issue.
        if (this.hostname === '' || forbiddenHostChars.test(this.hostname)) {
          throw new ERR_INVALID_URL(url);
        }

In performing the toASCII transform, eventually we reach src/node_url.cc:84:

  auto out = ada::parse<ada::url>("ws://x");
  DCHECK(out);
  if (!out->set_hostname(input)) {
    return args.GetReturnValue().Set(FIXED_ONE_BYTE_STRING(env->isolate(), ""));
  }

During the set_hostname call we arrive at deps/ada/ada.cpp:11603:

    if (checkers::is_ipv4(host.value())) {
      ada_log("parse_host fast path ipv4");
      return parse_ipv4(host.value());
    }

"if host is ipv4, then just parse as ipv4" -- but what is actually considered ipv4? At the top of ada.cpp:

ada_really_inline ada_constexpr bool is_ipv4(std::string_view view) noexcept {
  size_t last_dot = view.rfind('.');
  if (last_dot == view.size() - 1) {
    view.remove_suffix(1);
    last_dot = view.rfind('.');
  }
  std::string_view number =
      (last_dot == std::string_view::npos) ? view : view.substr(last_dot + 1);
  if (number.empty()) {
    return false;
  }
  /** Optimization opportunity: we have basically identified the last number of
     the ipv4 if we return true here. We might as well parse it and have at
     least one number parsed when we get to parse_ipv4. */
  if (std::all_of(number.begin(), number.end(), ada::checkers::is_digit)) {
    return true;
  }
  return (checkers::has_hex_prefix(number) &&
          std::all_of(number.begin() + 2, number.end(),
                      ada::unicode::is_lowercase_hex));
}

So any string matching \.\d+\.?$ is considered IPv4 by this function. Anything actually not, then:

  • fails in the subsequent parse_ipv4,
  • making set_hostname return false,
  • making this.hostname become the empty string, finally
  • failing the this.hostname === '' check leading to ERR_INVALID_URL being thrown.

Indeed:

node
Welcome to Node.js v18.17.0.
Type ".help" for more information.
> new url.Url().parse('a:.1')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
    at __node_internal_captureLargerStackTrace (node:internal/errors:496:5)
    at new NodeError (node:internal/errors:405:5)
    at Url.parse (node:url:445:17) {
  input: 'a:.1',
  code: 'ERR_INVALID_URL'
}

The behavior isn't observed on the latest main branch so something must have changed in the meantime, but I haven't dug deeper yet.

So for now, it may be prudent to revert the faulty commit then try (or not, actually, for an LTS branch) fixing and relanding it, to prevent more hair loss of unaware users affected by this bug?

@anonrig
Copy link
Member

anonrig commented Aug 2, 2023

@xen0n This issue fixed and pending to be released with the next Node.js 18 release: #48873

pluris pushed a commit to pluris/node that referenced this issue Aug 6, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
pluris pushed a commit to pluris/node that referenced this issue Aug 6, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
pluris pushed a commit to pluris/node that referenced this issue Aug 7, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
pluris pushed a commit to pluris/node that referenced this issue Aug 7, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
UlisesGascon pushed a commit to UlisesGascon/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
UlisesGascon pushed a commit to UlisesGascon/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48878
Refs: nodejs#48873
Refs: nodejs#48855
Refs: nodejs#48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
RafaelGSS pushed a commit that referenced this issue Aug 15, 2023
PR-URL: #48878
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
RafaelGSS pushed a commit that referenced this issue Aug 15, 2023
PR-URL: #48878
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
ruyadorno pushed a commit that referenced this issue Aug 16, 2023
PR-URL: #48878
Backport-PR-URL: #48873
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
ruyadorno pushed a commit that referenced this issue Aug 16, 2023
PR-URL: #48878
Backport-PR-URL: #48873
Refs: #48873
Refs: #48855
Refs: #48850
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
@PikachuEXE
Copy link

@anonrig anonrig closed this as completed Sep 21, 2023
@davidecarpini
Copy link

yeah updating node from v18.17.0 -> v18.20.0 fixed the issue 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
url Issues and PRs related to the legacy built-in url module. v18.x Issues that can be reproduced on v18.x or PRs targeting the v18.x-staging branch.
Projects
None yet
Development

Successfully merging a pull request may close this issue.