Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improving performance #528

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 32 additions & 3 deletions classes/range.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ class Range {
this.set = range
.split('||')
// map the range to a 2d array of comparators
.map(r => this.parseRange(r.trim()))
.map(r => this.parseRange(r))
// throw out any comparator lists that are empty
// this generally means that it was not a valid range, which is allowed
// in loose mode, but will still throw if the WHOLE range is invalid.
Expand Down Expand Up @@ -81,8 +81,8 @@ class Range {

// memoize range parsing for performance.
// this is a very hot path, and fully deterministic.
const memoOpts = Object.keys(this.options).join(',')
const memoKey = `parseRange:${memoOpts}:${range}`
const memoOpts = buildMemoKeyFromOptions(this.options)
const memoKey = memoOpts + range
const cached = cache.get(memoKey)
if (cached) {
return cached
Expand Down Expand Up @@ -190,6 +190,35 @@ class Range {
return false
}
}

function buildMemoKeyFromOptions(options) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to try a handwritten key such as:

`${options.includePrerelease ? 1 : 0},${options.loose ? 1 : 0}...

Which is what we do in the TypeScript compiler. Not sure if it's faster than here but could be interesting to test.

Copy link
Contributor Author

@H4ad H4ad Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt the performance:

buildMemoKeyFromOptions({"includePrelease":true}) x 1,121,468,719 ops/sec ±0.63% (96 runs sampled)
buildMemoKeyFromOptions({"includePrelease":true,"loose":true}) x 1,132,354,133 ops/sec ±0.67% (95 runs sampled)
buildMemoKeyFromOptions({"includePrelease":true,"loose":true,"rtl":true}) x 1,122,788,698 ops/sec ±0.53% (96 runs sampled)

After 1B ops/s, it almost has no effect choosing one or another.
The only drawback is increasing the lru-cache key by 6 characters instead of having it as 1.

If NPM team prefer readability over a tiny cache key, I can change the implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good use case for a bit flag. It's a bunch of booleans, and bitwise operations are super fast. And while bit ops are often frowned upon as being dense or obscure, if organized properly, they're much more maintainable than a series of if/else statements and hoping that you captured each case.

let s = 1
const FLAG_includePrerelease = s
s<<=1
const FLAG_loose = s
s<<=1
const FLAG_rtl = s
s<<=1
// ...

function buildMemoKeyFromOptions(options) {
  // now just bitwise-OR them all together as appropriate
  return (
    options.loose ? FLAG_loose : 0
  ) | (
    options.rtl ? FLAG_rtl : 0
  ) | (
    options.includePrerelease ? FLAG_includePrerelease : 0
  )
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the suggestion is to change Options, but to use a different way of constructing a key.

That being said, I think it should be possible to internally store flags for these and then recustruct the options bag when asked on whichever methods that request it. Then, internally, things can use the fast check and pass number around. Spitballing, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized a few seconds after writing that, and yes, using the option internally and then exposing makes sense, I'll try using the getter to compute the options and internally using just the bit flag.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking further, the user already has to pass the object in, so you probably can just keep that one and not reconstruct it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakebailey I tried, but it would break a lot of tests if I did.

I put the version of parseOptions with bit flags here: h4ad-forks@6c7a68a

@isaacs Take a look and see if this PR is worth pushing, I made a lot of changes.

Also, I introduce 1 small breaking change, now the numbers passed to parseOptions are considered valid options, so instead of re-parsing it by passing the object, I use many of the options as bit flags to construct new objects that are inside the functions.

About performance, no performance penalties and I assume it got a bit faster because I now parse the options as an object once and then just parse the options again as a number which saves some comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside I notice when using bitmask is the memory usage (ref: #528 (comment)) is the same as allocating the objects at runtime (8.7mb).

I don't know why this behavior is happening, but I just want to point out that it's not as memory efficient as Object.freeze.

if (options.includePrerelease === true) {
if (options.loose === true && options.rtl === true) {
return '1';
}

if (options.loose === true) {
return '2';
}

if (options.rtl === true) {
return '3';
}

return '4';
} else if (options.loose === true) {
if (options.rtl === true) {
return '5';
}

return '6';
} else if (options.rtl === true) {
return '7';
} else {
return '8';
}
}

module.exports = Range

const LRU = require('lru-cache')
Expand Down
52 changes: 41 additions & 11 deletions internal/parse-options.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,41 @@
// parse out just the options we care about so we always get a consistent
// obj with keys in a consistent order.
const opts = ['includePrerelease', 'loose', 'rtl']
const parseOptions = options =>
!options ? {}
: typeof options !== 'object' ? { loose: true }
: opts.filter(k => options[k]).reduce((o, k) => {
o[k] = true
return o
}, {})
module.exports = parseOptions
const var1 = Object.freeze({ includePrerelease: true, loose: true, rtl: true });
const var2 = Object.freeze({ includePrerelease: true, loose: true });
const var3 = Object.freeze({ includePrerelease: true, rtl: true });
const var4 = Object.freeze({ includePrerelease: true });
const var5 = Object.freeze({ loose: true, rtl: true });
const var6 = Object.freeze({ loose: true });
const var7 = Object.freeze({ rtl: true });
const emptyOpts = Object.freeze({});
Comment on lines +1 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically freezing objects makes them much slower, so it's worth benchmarking this.

Also, all of these should be __proto__: null so lookups can be faster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on trying out Object.create(null), but I wonder if trying out making this a Map, or maybe even using Bitmasks could speed it up further?

Copy link

@jakebailey jakebailey Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, freezing has never changed performance in anything I've ever tried, unfortuantely.

Honestly, I think that the way forward in optimizing things is not going to be down to tricks like freezing or looking for the fastest LRU cache; it's going to be overall algorithmic changes to the code.

Copy link
Contributor Author

@H4ad H4ad Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of freezing is just to prevent modification in the object, also, is not to gain performance but to save memory allocation.

With the older version, I did a test that ended with 8.7mb of memory usage and with this freeze version, it was consistent at 6.7mb.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ljharb 's point was more about using Object.freeze (the fn itself) makes the code slower, not faster (because the freezing itself is an operation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurtextrem like #528 (comment), if we do it inline it can be slower, so as not to break the tests, I think it's better not to include __proto__: null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fine to change the tests here, to be clear. I'm very surprised doing it inline is slower - in which engines is that true?

Note that we're not just concerned with node here, this package is used in browsers as well.

Copy link

@kurtextrem kurtextrem Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb The benchmark site I've used (from the other thread) doesn't work for me in Firefox, so I can only say it's definitely much slower in Chromium, Safari and Node (that caught my be surprise too, I assume __proto__ has a special treatment during creation if not done with Object.create(null), thus making it slower than the standard object). It's even slower than Object.freeze/seal.
Just from my microbenchmark, I'd stick to a regular object creation (without freezing), but if we can benchmark all browser vendors + node and see that Object.freeze is not slower than without, we can as well pick up the small memory reduction.

Copy link

@kurtextrem kurtextrem Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screenshot, because maybe not everyone has access to a Mac following this discussion (higher is better):
image

Copy link
Contributor Author

@H4ad H4ad Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurtextrem The main problem is if we were returning the object but since we run it once, the op/s is not a problem.

It will be a problem if it slows down the property access but I don't think that is the case.


const parseOptions = options => {
if (!options) return emptyOpts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way this is faster than return {}, have you benchmarked this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shouldn't it? object allocation (and cleanup later) isn't free

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea is to reduce memory allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmarked it? There are a great many optimizations people attempt to make in JS that are counterintuitively slower than the idiomatic approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb Yes,

benchmark-memory.js
const satisfies = require('./functions/satisfies');

(() => {
  const versions = ['1.0.3', '2.2.2', '2.3.0'];
  const versionToCompare = '>=1.0.2';
  const option1 = { includePrelease: true };
  const option2 = { includePrelease: true, loose: true };
  const option3 = { includePrelease: true, loose: true, rtl: true };

  for (let i = 0; i < 1e5; i++) {
    for (const version of versions) {
      satisfies(versionToCompare, version);
      satisfies(versionToCompare, version, option1);
      satisfies(versionToCompare, version, option2);
      satisfies(versionToCompare, version, option3);
    }
  }
})();

Run the benchmark with the following command: node --trace-gc --trace-deopt benchmark-memory.js.

The difference shows up at the end of the text file, without object freeze with ended up with 8.7mb of memory, and with object freeze, we ended up with 6.7mb.

About the pure performance of satisfies, I didn't notice any drawback:

Without Object.freeze:

satisfies(1.0.6, 1.0.3||^2.0.0) x 420,051 ops/sec ±1.14% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 429,208 ops/sec ±0.59% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 459,359 ops/sec ±0.67% (88 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 409,670 ops/sec ±1.02% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 428,150 ops/sec ±0.91% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 464,502 ops/sec ±0.78% (97 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 406,053 ops/sec ±0.83% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 429,494 ops/sec ±0.58% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 433,164 ops/sec ±1.11% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 385,354 ops/sec ±0.87% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 404,200 ops/sec ±0.72% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 434,975 ops/sec ±1.02% (97 runs sampled)

With Object.freeze

satisfies(1.0.6, 1.0.3||^2.0.0) x 419,615 ops/sec ±1.38% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 438,583 ops/sec ±0.48% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 447,261 ops/sec ±0.92% (93 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 415,861 ops/sec ±1.10% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 433,068 ops/sec ±0.98% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 461,636 ops/sec ±0.92% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 412,220 ops/sec ±0.06% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 428,966 ops/sec ±0.65% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 448,587 ops/sec ±0.29% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 388,662 ops/sec ±0.48% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 412,242 ops/sec ±0.16% (93 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 432,172 ops/sec ±1.03% (96 runs sampled)

Copy link
Contributor Author

@H4ad H4ad Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb Like @kurtextrem mentioned in the other comment, inline static null proto is slower and we ended up with 8.7mb.

Code:

const parseOptions = options => {
  if (!options) return Object.create(null);

  if (typeof options !== 'object') return { __proto__: null, loose: true };

  const opts = Object.create(null);

  if (options.includePrerelease) opts.includePrerelease = true;
  if (options.loose) opts.loose = true;
  if (options.rtl) opts.rtl = true;

  return opts;
};

Perf:

satisfies(1.0.6, 1.0.3||^2.0.0) x 341,924 ops/sec ±0.42% (93 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 357,034 ops/sec ±0.13% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 384,200 ops/sec ±0.54% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 328,873 ops/sec ±0.98% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 345,029 ops/sec ±0.67% (96 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 367,659 ops/sec ±0.37% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 316,462 ops/sec ±0.43% (94 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 323,678 ops/sec ±0.90% (93 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 339,436 ops/sec ±0.94% (96 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 287,875 ops/sec ±4.03% (88 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 314,779 ops/sec ±1.12% (94 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 332,332 ops/sec ±0.25% (96 runs sampled)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. What i meant was, adding in to your static reference frozen objects a proto:null inline to those declarations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://esbench.com/bench/64271fec6c89f600a57022e8 as mentioned, does not appear to be faster to initialize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, i can’t seem to run the benchmarks on iOS Safari. Have they been run on every browser, not just chrome? what about in node, where perf differs from chrome at times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb About freeze with proto null:

satisfies(1.0.6, 1.0.3||^2.0.0) x 21,856 ops/sec ±0.60% (90 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0) x 21,716 ops/sec ±0.79% (95 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0) x 21,643 ops/sec ±0.76% (95 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true}) x 21,216 ops/sec ±0.25% (93 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true}) x 21,171 ops/sec ±0.41% (86 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true}) x 21,113 ops/sec ±0.38% (92 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true}) x 310,400 ops/sec ±4.69% (95 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true}) x 361,617 ops/sec ±0.58% (89 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true}) x 381,054 ops/sec ±0.07% (97 runs sampled)
satisfies(1.0.6, 1.0.3||^2.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 335,386 ops/sec ±0.44% (96 runs sampled)
satisfies(1.0.6, 2.2.2||~3.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 350,006 ops/sec ±0.62% (97 runs sampled)
satisfies(1.0.6, 2.3.0||<4.0.0, {"includePrelease":true,"loose":true,"rtl":true}) x 370,139 ops/sec ±0.09% (94 runs sampled)

Soooo slow, I don't even know why.


if (typeof options !== 'object') return var6;

if (options.includePrerelease) {
if (options.loose && options.rtl) {
return var1;
}

if (options.loose) {
return var2;
}

if (options.rtl) {
return var3;
}

return var4;
} else if (options.loose) {
if (options.rtl) {
return var5;
}

return var6;
} else if (options.rtl) {
return var7;
} else {
return emptyOpts;
}
};
module.exports = parseOptions;