Using faster url parser #643

petkaantonov · 2015-01-28T21:12:32Z

(Original node issue nodejs/node-v0.x-archive#6788)

I have rewritten the url parser module of node core as it was/is a serious bottleneck in some of the techempower benchmarks

Running node's urlparser benchmark using iojs it's still 16x faster when not retrieving properties and 11x faster when retrieving all properties (which are lazy getters in my implementation). In absolute terms the current iojs urlparser throughputs 25k parses per second vs 400k lazy/270k eager parses per second. format and resolve are also affected in similar magnitudes.

The text was updated successfully, but these errors were encountered:

rvagg · 2015-01-28T21:34:11Z

a pull request or some other request for specific action is likely to get more attention here fwiw

petkaantonov · 2015-01-28T21:35:37Z

Well I request your opinion before I make it PR ready (it's a normal user module right now)

vkurchatkin · 2015-01-28T21:39:43Z

I like the idea of making url faster, but replacing the whole module frightens me a little

xaka · 2015-01-28T21:43:45Z

I think there is nothing wrong with using faster parser as it's just an
implementation details as long as API stays the same and all existing tests
pass

On Wed, Jan 28, 2015 at 1:39 PM, Vladimir Kurchatkin <
notifications@github.com> wrote:

I like the idea of making url faster, but replacing the whole module
frightens me a little

—
Reply to this email directly or view it on GitHub
#643 (comment).

runeli · 2015-01-28T21:57:12Z

I would love to see this on io

vkurchatkin · 2015-01-28T22:36:36Z

@xaka It's about subtle things, like data properties vs. accessor properties. It's hard to foresee everything once whole module changes. This requires major version bump at least.

xaka · 2015-01-28T22:44:35Z

That's a good point. Switching from properties to accessors means breaking
API as things like console.log(...) would work differently and people might
be using the stdout for instance for their own reasons, etc.

On Wed, Jan 28, 2015 at 2:36 PM, Vladimir Kurchatkin <
notifications@github.com> wrote:

@xaka https://github.com/xaka It's about subtle things, like data
properties vs. accessor properties. It's hard to foresee everything once
whole module changes. This requires major version bump at least.

—
Reply to this email directly or view it on GitHub
#643 (comment).

jonathanong · 2015-01-29T08:02:11Z

this is one of the reasons why i'm trying to get rid of the util._extend() usage. otherwise, i fear too much would break.

mikeal · 2015-02-01T21:23:56Z

Can't we lazily patch the stringify and json transforms to avoid API change/break?

petkaantonov · 2015-02-04T08:24:54Z

The getters can be made enumerable (but they're still on the prototype) and the Url class can implement toJSON method. So that leaves util._extend(), which should be internal method?

It can also be considered to make the properties just eager data properties, it would still be 11x faster. But often many of the properties are not needed (especially .href?) so it could be a bummer.

rvagg · 2015-05-02T06:23:01Z

fixed at #1561

petkaantonov · 2015-05-04T07:11:34Z

Reopening since it was reverted 😄

silverwind · 2015-05-04T19:34:52Z

Let's use this issue for further discussion on the new url implementation.

Previous discussions at #933, #1561, #1591 and #1602.

Fishrock123 · 2015-05-04T19:36:20Z

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

silverwind · 2015-05-04T19:38:30Z

My question would be how much of a perf difference would switching to plain properties over accessors have. If we can keep the API and get most of the perf, that'd be a win-win.

petkaantonov · 2015-05-04T19:40:24Z

perf is not related to accessors, but data property api is terrible and not like browser api at all :)

silverwind · 2015-05-04T19:44:28Z

I'd question how browserlike we want to go. Do we really want something like this?

> document.location.hash = null
> document.location.hash
"#null"

YurySolovyov · 2015-05-04T19:56:00Z

I assume the intention to follow browsers comes form wish of writing "isomorphic" apps and following standards in general (even though it gives you nonsense like #null) ?

Fishrock123 · 2015-05-04T19:57:04Z

I'd prefer to not have to run everything though toString(), which is probably what the browser does.

(I guess that's maybe not a bad thing?)

othiym23 · 2015-05-05T01:24:43Z

(copypasta'd from #1591)

This will probably find its way back to the TC at some point, but as I explain in npm/npm#8163, @isaacs and I don't think the right thing to do is to change npm to match the new url interface. I'm happy to discuss our reasoning, but ultimately @isaacs made the call, and I agreed, that this set of changes is sufficiently problematic to require a lot more discussion and careful planning to land.

petkaantonov · 2015-05-05T06:22:59Z

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

I will simply just improve the performance, if it wasn't for lazy .href it could even be done in a patch. But only making .href accessor should have no or minimal compatibility issues

petkaantonov · 2015-05-05T06:24:34Z

I'd question how browserlike we want to go. Do we really want something like this?
document.location.hash = null
document.location.hash
"#null"

The point is not this but accessors would have allowed all components of the url be automatically up to date when one is changed.

domenic · 2015-05-05T08:05:36Z

I agree it would be much better to match browsers (as @isaacs has always suggested; surprised to see the about-face) and to maintain a consistent state between the different properties.

If we can't manage to do that with require("url") then maybe we'll instead want to work on a standards-compliant URL global that does match browsers (including consistency guarantees). That would allow the more drastic changes necessary for compliance, e.g. the introduction of searchParams and so on. This might fall out naturally out of the work I'm doing in V8/Blink at my day job anyway (e.g. it might become just turning on a build flag in V8).

rlidwka · 2015-05-05T09:29:04Z

My suggestion would be:

In io.js 2.x use the same parser as before, but emit a warning on deletion. A few ideas:

set configurable: false so if people try to delete it, this operation will fail (strict mode only)
explicitly check if any properties are deleted in url.format (performance hit?)

In io.js 3.x drop support for that.

If npm won't change url interface until then, we should consider applying floating patch.

domenic · 2015-05-05T09:45:48Z

In case anyone didn't notice, 2.0.0 was released so anything backward-incompatible (or probably even annoying, like a warning) would have to be in 3.0.0.

rlidwka · 2015-05-05T10:06:46Z

Well there is nothing wrong with introducing a warning in a minor release.

isaacs · 2015-05-05T19:05:58Z

The intention is not to follow browsers for the good of writing isomorphic apps.

The intention is to follow url resolution logic like browsers, so that crawlers and other Node.js programs behave reasonably. The url object parsing is based on the browser location global and <a> html tag fields, but it also has to support being passed to http.request and friends. So, it's not completely identical, and needn't be.

A 100% isomorphic browser-style accessor-using auto-updating url object is a great idea. It belongs in npm, not in core.

spion · 2015-08-04T21:54:45Z

Will this make it into 4.0.0?

silverwind · 2015-08-04T22:21:14Z

That depends on @petkaantonov right now. See #1650.

thefourtheye · 2015-08-05T03:06:20Z

The #1650 is continued at #2303 now.

jasnell · 2016-08-05T22:17:03Z

Refs: #7448 ... a WHATWG URL Parser implementation. Bit slower than url.parse() currently but more correct to the standard.

jasnell · 2016-08-05T22:17:26Z

Closing given the lack of continued activity and discussion on this. Can reopen if necessary.

pkoretic · 2017-02-07T10:58:10Z

@jasnell @petkaantonov testing this using node 7.5.0 on 16.04 (dual core kvm Intel Xeon E3-12xx v2) still shows huge difference

we found about this after cpu profiling our application which showed url.parse takes most of the time and we can't avoid making requests

we have switched to fast-url-parser for now
any info on this maybe?

node benchmark/nodecore.js

misc/url.js parse(): 40645.099
misc/url.js format(): 36836.169
misc/url.js resolve("../foo/bar?baz=boom"): 38703.878
misc/url.js resolve("foo/bar"): 39136.199
misc/url.js resolve("http://nodejs.org"): 37845.184
misc/url.js resolve("./foo/bar?baz"): 38815.977

node benchmark/urlparser.js

misc/url.js parse(): 184957.14
misc/url.js format(): 170904.26
misc/url.js resolve("../foo/bar?baz=boom"): 181324.04
misc/url.js resolve("foo/bar"): 169483.69
misc/url.js resolve("http://nodejs.org"): 180839.33
misc/url.js resolve("./foo/bar?baz"): 174857.71

nitrocode · 2018-04-03T21:46:16Z

I just hit an issue in production where extremely long referer urls (3000 chars, 5500 chars, and 9000 chars) were causing our node instances to spike CPU usage and we tracked it down to the url library. @pkoretic your fast-url-parser is much more performant.

Anyone have an update on when this will be merged? I'll switch to fast-url-parser for now.

Bessonov · 2019-10-18T22:23:06Z

Latest node.

$ node -v
v12.12.0
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/nodecore.js 
misc/url.js parse(): 68502.876
misc/url.js format(): 60017.421
misc/url.js resolve("../foo/bar?baz=boom"): 59991.764
misc/url.js resolve("foo/bar"): 64876.780
misc/url.js resolve("http://nodejs.org"): 63232.093
misc/url.js resolve("./foo/bar?baz"): 64872.514
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/urlparser.js 
misc/url.js parse(): 170138.65
misc/url.js format(): 191450.64
misc/url.js resolve("../foo/bar?baz=boom"): 208052.98
misc/url.js resolve("foo/bar"): 271166.70
misc/url.js resolve("http://nodejs.org"): 280692.39
misc/url.js resolve("./foo/bar?baz"): 255188.00

Bessonov · 2022-03-06T09:39:12Z

Still 2.5x:

$ node -v
v16.14.0

$ node benchmark/nodecore.js
misc/url.js parse(): 78603.333
misc/url.js format(): 72242.340
misc/url.js resolve("../foo/bar?baz=boom"): 71972.167
misc/url.js resolve("foo/bar"): 78404.141
misc/url.js resolve("http://nodejs.org"): 78088.377
misc/url.js resolve("./foo/bar?baz"): 78914.819

$ node benchmark/urlparser.js
misc/url.js parse(): 169323.52
misc/url.js format(): 199909.74
misc/url.js resolve("../foo/bar?baz=boom"): 198722.90
misc/url.js resolve("foo/bar"): 226340.86
misc/url.js resolve("http://nodejs.org"): 159880.62
misc/url.js resolve("./foo/bar?baz"): 284006.64

vkurchatkin added the url Issues and PRs related to the legacy built-in url module. label Jan 28, 2015

mscdex mentioned this issue Feb 1, 2015

events: optimize various functions #601

Closed

petkaantonov mentioned this issue Feb 24, 2015

url: significantly improve the performance of the url module #933

Closed

Fishrock123 added this to the 2.0.0 milestone Apr 28, 2015

rvagg closed this as completed May 2, 2015

petkaantonov reopened this May 4, 2015

silverwind removed this from the 2.0.0 milestone May 4, 2015

silverwind mentioned this issue May 4, 2015

Revert url perf changes #1602

Merged

silverwind added this to the 3.0.0 milestone May 4, 2015

Fishrock123 removed this from the 3.0.0 milestone Jul 7, 2015

Fishrock123 added the feature request Issues that request new features to be added to Node.js. label Jan 20, 2016

jasnell added the stalled Issues and PRs that are stalled. label Jul 3, 2016

jasnell closed this as completed Aug 5, 2016

Using faster url parser #643

Using faster url parser #643

Comments

petkaantonov commented Jan 28, 2015

rvagg commented Jan 28, 2015

petkaantonov commented Jan 28, 2015

vkurchatkin commented Jan 28, 2015

xaka commented Jan 28, 2015

runeli commented Jan 28, 2015

vkurchatkin commented Jan 28, 2015

xaka commented Jan 28, 2015

jonathanong commented Jan 29, 2015

mikeal commented Feb 1, 2015

petkaantonov commented Feb 4, 2015

rvagg commented May 2, 2015

petkaantonov commented May 4, 2015

silverwind commented May 4, 2015

Fishrock123 commented May 4, 2015

silverwind commented May 4, 2015

petkaantonov commented May 4, 2015

silverwind commented May 4, 2015

YurySolovyov commented May 4, 2015

Fishrock123 commented May 4, 2015

othiym23 commented May 5, 2015

petkaantonov commented May 5, 2015

petkaantonov commented May 5, 2015

domenic commented May 5, 2015

rlidwka commented May 5, 2015

domenic commented May 5, 2015

rlidwka commented May 5, 2015

isaacs commented May 5, 2015

spion commented Aug 4, 2015

silverwind commented Aug 4, 2015

thefourtheye commented Aug 5, 2015

jasnell commented Aug 5, 2016

jasnell commented Aug 5, 2016

pkoretic commented Feb 7, 2017

nitrocode commented Apr 3, 2018

Bessonov commented Oct 18, 2019

Bessonov commented Mar 6, 2022