Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using faster url parser #643

Closed
petkaantonov opened this issue Jan 28, 2015 · 36 comments
Closed

Using faster url parser #643

petkaantonov opened this issue Jan 28, 2015 · 36 comments
Labels
feature request Issues that request new features to be added to Node.js. stalled Issues and PRs that are stalled. url Issues and PRs related to the legacy built-in url module.

Comments

@petkaantonov
Copy link
Contributor

(Original node issue nodejs/node-v0.x-archive#6788)

I have rewritten the url parser module of node core as it was/is a serious bottleneck in some of the techempower benchmarks

Running node's urlparser benchmark using iojs it's still 16x faster when not retrieving properties and 11x faster when retrieving all properties (which are lazy getters in my implementation). In absolute terms the current iojs urlparser throughputs 25k parses per second vs 400k lazy/270k eager parses per second. format and resolve are also affected in similar magnitudes.

@rvagg
Copy link
Member

rvagg commented Jan 28, 2015

a pull request or some other request for specific action is likely to get more attention here fwiw

@petkaantonov
Copy link
Contributor Author

Well I request your opinion before I make it PR ready (it's a normal user module right now)

@vkurchatkin
Copy link
Contributor

I like the idea of making url faster, but replacing the whole module frightens me a little

@vkurchatkin vkurchatkin added the url Issues and PRs related to the legacy built-in url module. label Jan 28, 2015
@xaka
Copy link

xaka commented Jan 28, 2015

I think there is nothing wrong with using faster parser as it's just an
implementation details as long as API stays the same and all existing tests
pass

On Wed, Jan 28, 2015 at 1:39 PM, Vladimir Kurchatkin <
notifications@github.com> wrote:

I like the idea of making url faster, but replacing the whole module
frightens me a little


Reply to this email directly or view it on GitHub
#643 (comment).

@runeli
Copy link

runeli commented Jan 28, 2015

I would love to see this on io

@vkurchatkin
Copy link
Contributor

@xaka It's about subtle things, like data properties vs. accessor properties. It's hard to foresee everything once whole module changes. This requires major version bump at least.

@xaka
Copy link

xaka commented Jan 28, 2015

That's a good point. Switching from properties to accessors means breaking
API as things like console.log(...) would work differently and people might
be using the stdout for instance for their own reasons, etc.

On Wed, Jan 28, 2015 at 2:36 PM, Vladimir Kurchatkin <
notifications@github.com> wrote:

@xaka https://github.com/xaka It's about subtle things, like data
properties vs. accessor properties. It's hard to foresee everything once
whole module changes. This requires major version bump at least.


Reply to this email directly or view it on GitHub
#643 (comment).

@jonathanong
Copy link
Contributor

this is one of the reasons why i'm trying to get rid of the util._extend() usage. otherwise, i fear too much would break.

@mikeal
Copy link
Contributor

mikeal commented Feb 1, 2015

Can't we lazily patch the stringify and json transforms to avoid API change/break?

@petkaantonov
Copy link
Contributor Author

The getters can be made enumerable (but they're still on the prototype) and the Url class can implement toJSON method. So that leaves util._extend(), which should be internal method?

It can also be considered to make the properties just eager data properties, it would still be 11x faster. But often many of the properties are not needed (especially .href?) so it could be a bummer.

@rvagg
Copy link
Member

rvagg commented May 2, 2015

fixed at #1561

@rvagg rvagg closed this as completed May 2, 2015
@petkaantonov
Copy link
Contributor Author

Reopening since it was reverted 😄

@petkaantonov petkaantonov reopened this May 4, 2015
@silverwind silverwind removed this from the 2.0.0 milestone May 4, 2015
@silverwind
Copy link
Contributor

Let's use this issue for further discussion on the new url implementation.

Previous discussions at #933, #1561, #1591 and #1602.

@Fishrock123
Copy link
Contributor

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

@silverwind silverwind added this to the 3.0.0 milestone May 4, 2015
@silverwind
Copy link
Contributor

My question would be how much of a perf difference would switching to plain properties over accessors have. If we can keep the API and get most of the perf, that'd be a win-win.

@petkaantonov
Copy link
Contributor Author

perf is not related to accessors, but data property api is terrible and not like browser api at all :)

@silverwind
Copy link
Contributor

I'd question how browserlike we want to go. Do we really want something like this?

> document.location.hash = null
> document.location.hash
"#null"

@YurySolovyov
Copy link

I assume the intention to follow browsers comes form wish of writing "isomorphic" apps and following standards in general (even though it gives you nonsense like #null) ?

@Fishrock123
Copy link
Contributor

I'd prefer to not have to run everything though toString(), which is probably what the browser does.

(I guess that's maybe not a bad thing?)

@othiym23
Copy link
Contributor

othiym23 commented May 5, 2015

(copypasta'd from #1591)

This will probably find its way back to the TC at some point, but as I explain in npm/npm#8163, @isaacs and I don't think the right thing to do is to change npm to match the new url interface. I'm happy to discuss our reasoning, but ultimately @isaacs made the call, and I agreed, that this set of changes is sufficiently problematic to require a lot more discussion and careful planning to land.

@petkaantonov
Copy link
Contributor Author

I think we could probably plan to get this into 3.0.0 with time to let the ecosystem know / start updating.

I will simply just improve the performance, if it wasn't for lazy .href it could even be done in a patch. But only making .href accessor should have no or minimal compatibility issues

@petkaantonov
Copy link
Contributor Author

I'd question how browserlike we want to go. Do we really want something like this?

document.location.hash = null
document.location.hash
"#null"

The point is not this but accessors would have allowed all components of the url be automatically up to date when one is changed.

@domenic
Copy link
Contributor

domenic commented May 5, 2015

I agree it would be much better to match browsers (as @isaacs has always suggested; surprised to see the about-face) and to maintain a consistent state between the different properties.

If we can't manage to do that with require("url") then maybe we'll instead want to work on a standards-compliant URL global that does match browsers (including consistency guarantees). That would allow the more drastic changes necessary for compliance, e.g. the introduction of searchParams and so on. This might fall out naturally out of the work I'm doing in V8/Blink at my day job anyway (e.g. it might become just turning on a build flag in V8).

@rlidwka
Copy link
Contributor

rlidwka commented May 5, 2015

My suggestion would be:

In io.js 2.x use the same parser as before, but emit a warning on deletion. A few ideas:

  • set configurable: false so if people try to delete it, this operation will fail (strict mode only)
  • explicitly check if any properties are deleted in url.format (performance hit?)

In io.js 3.x drop support for that.

If npm won't change url interface until then, we should consider applying floating patch.

@domenic
Copy link
Contributor

domenic commented May 5, 2015

In case anyone didn't notice, 2.0.0 was released so anything backward-incompatible (or probably even annoying, like a warning) would have to be in 3.0.0.

@rlidwka
Copy link
Contributor

rlidwka commented May 5, 2015

Well there is nothing wrong with introducing a warning in a minor release.

@isaacs
Copy link
Contributor

isaacs commented May 5, 2015

The intention is not to follow browsers for the good of writing isomorphic apps.

The intention is to follow url resolution logic like browsers, so that crawlers and other Node.js programs behave reasonably. The url object parsing is based on the browser location global and <a> html tag fields, but it also has to support being passed to http.request and friends. So, it's not completely identical, and needn't be.

A 100% isomorphic browser-style accessor-using auto-updating url object is a great idea. It belongs in npm, not in core.

@Fishrock123 Fishrock123 removed this from the 3.0.0 milestone Jul 7, 2015
@spion
Copy link

spion commented Aug 4, 2015

Will this make it into 4.0.0?

@silverwind
Copy link
Contributor

That depends on @petkaantonov right now. See #1650.

@thefourtheye
Copy link
Contributor

The #1650 is continued at #2303 now.

@Fishrock123 Fishrock123 added the feature request Issues that request new features to be added to Node.js. label Jan 20, 2016
@jasnell jasnell added the stalled Issues and PRs that are stalled. label Jul 3, 2016
@jasnell
Copy link
Member

jasnell commented Aug 5, 2016

Refs: #7448 ... a WHATWG URL Parser implementation. Bit slower than url.parse() currently but more correct to the standard.

@jasnell
Copy link
Member

jasnell commented Aug 5, 2016

Closing given the lack of continued activity and discussion on this. Can reopen if necessary.

@jasnell jasnell closed this as completed Aug 5, 2016
@pkoretic
Copy link

pkoretic commented Feb 7, 2017

@jasnell @petkaantonov testing this using node 7.5.0 on 16.04 (dual core kvm Intel Xeon E3-12xx v2) still shows huge difference

we found about this after cpu profiling our application which showed url.parse takes most of the time and we can't avoid making requests

we have switched to fast-url-parser for now
any info on this maybe?

node benchmark/nodecore.js

misc/url.js parse(): 40645.099
misc/url.js format(): 36836.169
misc/url.js resolve("../foo/bar?baz=boom"): 38703.878
misc/url.js resolve("foo/bar"): 39136.199
misc/url.js resolve("http://nodejs.org"): 37845.184
misc/url.js resolve("./foo/bar?baz"): 38815.977

node benchmark/urlparser.js

misc/url.js parse(): 184957.14
misc/url.js format(): 170904.26
misc/url.js resolve("../foo/bar?baz=boom"): 181324.04
misc/url.js resolve("foo/bar"): 169483.69
misc/url.js resolve("http://nodejs.org"): 180839.33
misc/url.js resolve("./foo/bar?baz"): 174857.71

@nitrocode
Copy link

I just hit an issue in production where extremely long referer urls (3000 chars, 5500 chars, and 9000 chars) were causing our node instances to spike CPU usage and we tracked it down to the url library. @pkoretic your fast-url-parser is much more performant.

Anyone have an update on when this will be merged? I'll switch to fast-url-parser for now.

@Bessonov
Copy link

Latest node.

$ node -v
v12.12.0
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/nodecore.js 
misc/url.js parse(): 68502.876
misc/url.js format(): 60017.421
misc/url.js resolve("../foo/bar?baz=boom"): 59991.764
misc/url.js resolve("foo/bar"): 64876.780
misc/url.js resolve("http://nodejs.org"): 63232.093
misc/url.js resolve("./foo/bar?baz"): 64872.514
toxa@5fb84e50fca4:/tmp/urlparser$ node benchmark/urlparser.js 
misc/url.js parse(): 170138.65
misc/url.js format(): 191450.64
misc/url.js resolve("../foo/bar?baz=boom"): 208052.98
misc/url.js resolve("foo/bar"): 271166.70
misc/url.js resolve("http://nodejs.org"): 280692.39
misc/url.js resolve("./foo/bar?baz"): 255188.00

@Bessonov
Copy link

Bessonov commented Mar 6, 2022

Still 2.5x:

$ node -v
v16.14.0

$ node benchmark/nodecore.js
misc/url.js parse(): 78603.333
misc/url.js format(): 72242.340
misc/url.js resolve("../foo/bar?baz=boom"): 71972.167
misc/url.js resolve("foo/bar"): 78404.141
misc/url.js resolve("http://nodejs.org"): 78088.377
misc/url.js resolve("./foo/bar?baz"): 78914.819

$ node benchmark/urlparser.js
misc/url.js parse(): 169323.52
misc/url.js format(): 199909.74
misc/url.js resolve("../foo/bar?baz=boom"): 198722.90
misc/url.js resolve("foo/bar"): 226340.86
misc/url.js resolve("http://nodejs.org"): 159880.62
misc/url.js resolve("./foo/bar?baz"): 284006.64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stalled Issues and PRs that are stalled. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

No branches or pull requests