Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: unsafe stringify now uses concatenation #676

Closed
wants to merge 1 commit into from

Conversation

Cadienvan
Copy link

@Cadienvan Cadienvan commented Jan 9, 2023

The only change made was moving away from the + operator, using string concatenation instead.
An invisible, yet valuable change IMO.

The explanation is pretty simple: The + operator creates a new string on each operation, which can be inefficient when concatenating a large number of strings, or in this case, concatenating different "chunks" multiple times.

@broofa I just saw your comment on my other pull request, yet I was already preparing this PR, so please let me know if this can be valid or performance isn't taken in consideration at all even in these cases, I'll stop going for it if it is the case ofc :)

Before:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,599,033 ops/sec ±1.74% (83 runs sampled)
uuid.parse() x 1,668,695 ops/sec ±2.76% (90 runs sampled)
---

uuid.v1() x 1,053,405 ops/sec ±1.34% (89 runs sampled)
uuid.v1() fill existing array x 2,140,536 ops/sec ±1.81% (92 runs sampled)
uuid.v4() x 8,738,755 ops/sec ±1.92% (86 runs sampled)
uuid.v4() fill existing array x 2,978,811 ops/sec ±0.34% (92 runs sampled)
uuid.v3() x 156,645 ops/sec ±2.54% (70 runs sampled)
uuid.v5() x 177,174 ops/sec ±2.65% (68 runs sampled)
Fastest is uuid.v4()

After:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,726,349 ops/sec ±0.56% (91 runs sampled)
uuid.parse() x 1,704,779 ops/sec ±0.50% (95 runs sampled)
---

uuid.v1() x 1,089,921 ops/sec ±0.34% (93 runs sampled)
uuid.v1() fill existing array x 2,210,764 ops/sec ±0.39% (89 runs sampled)
uuid.v4() x 9,479,529 ops/sec ±2.45% (86 runs sampled)
uuid.v4() fill existing array x 3,063,681 ops/sec ±1.19% (96 runs sampled)
uuid.v3() x 156,433 ops/sec ±2.42% (73 runs sampled)
uuid.v5() x 173,674 ops/sec ±2.46% (71 runs sampled)
Fastest is uuid.v4()

@broofa
Copy link
Member

broofa commented Jan 9, 2023

This is more in line with the sort of change I'd be willing to consider (relatively modest impact on codebase). Running the benchmarks on my own system, for both node as well as Chrome and Firefox, I see similar numbers.

(You can run the benchmarks in the browser via npm run build && npx http-server -o examples/benchmark/benchmark.html)

The + operator creates a new string on each operation

Do you have a link or article explaining why the += is faster than + operator? Doesn't += also create a new string?

@ctavan @Rush @LinusU : Thoughts? I'm on the fence here. This is a ~8% perf gain, with minimal code change. But as I said to @Cadienvan in his other PR, I'm increasingly leery of making changes simply for the sake of performance.

@Cadienvan
Copy link
Author

I'm sorry I can't find any specific article about it, yet I'm sure it works like that as I struggled with some performance issues on a Node application I was working on and this solved the issue. I had a look (We are talking about more than a year ago) at some source code but can't find it right now.

Here's a proper explanation of the phenomenon:

In JavaScript, the + operator is used for concatenation when talking about strings.

Concatenation involves creating a new string by combining the two operands (+ and =), which requires allocating new memory and copying the strings into the new memory.

On the other hand, the += operator is used specifically for concatenation. It works by taking the left operand (a string), appending the right operand to it, and then assigning the resulting string back to the left operand. This can be more efficient than using the + operator for concatenation, because it avoids the need to allocate new memory and copy the strings. Instead, it modifies the existing string in-place.

I'm pretty sure using something like node-clinic can provide some information about it, but never used it so I can't tell for sure.

@broofa
Copy link
Member

broofa commented Jan 9, 2023

struggled with some performance issues on a Node application

I'd be curious to know how this change impacts perf on non-Node (V8) runtimes.

@Cadienvan
Copy link
Author

I have to be honest, tried some popular browsers with small benchmarks out of curiosity, couldn't find anything relevant as whichever engine is involved, the worst it can happen is that both operations create a new string as performances always got better or stayed equal, never gone down.

@Rush
Copy link
Contributor

Rush commented Jan 9, 2023

@broofa I will share my thoughts since I am highlighted and I did make performance contributions in the past :)

Software is not made in a vacuum and it does not run on abstract machines but on real hardware and in terms of JS it runs on larger software stacks (V8, Mozilla).

Just like the Linux kernel optimizes its code to run more efficiently on real, tangible hardware it is our responsibility as JS developers to write code that runs efficiently on JS engines that people use in real life.

So yes, an 8% improvement - if confirmed on multiple hardware/JS engine setups sounds meaningful enough to invest in.

@LinusU
Copy link
Member

LinusU commented Jan 10, 2023

It seems like this is ~27% slower on Safari 16.2 on an M1 MacBook Air. On Chrome 108.0.5359.124 (Official Build) (arm64) on the same computer, it's ~8% slower.

I've created a test case that anyone can run on JSBench:

https://jsbench.me/j6lcpzr2u3/1

@Cadienvan
Copy link
Author

It isn't the case on my M1 Max (Chrome 108.0.5359.124 as yours) where the new version is faster, yet of course I'll accept whatever you decide.

@broofa
Copy link
Member

broofa commented Jan 10, 2023

Thanks @LinusU. I ran your test case on a variety of browsers using BrowserStack. Results below.

Note: JSBench and the uuid benchmark test both use BenchmarkJS.

Edge, Chrome, Yandex, and Opera all use Chromium (and therefore V8, I assume?). Firefox uses SpiderMonkey. Safari uses Nitro.

This benchmark is consistently slower on all platforms. Which is ... interesting, since uuids built-in test shows it as being faster. FWIW, I swapped the tests around on JSBench to make sure run order wasn't a factor (it wasn't).

I gotta get to work here, but I'd like to understand why we seem to be getting different results between JSBench and the built-in test before moving forward with this.


Edge - 4.9% slower:

Firefox - 5.4% slower:

Chrome - 3.8% slower:

Safari - 26.8% slower. 'Tested on my laptop because BrowserStack's Safari refused to load the JSBench site for some reason:

Yandex - Didn't load on BrowserStack, and I'm not about to install it on my laptop. Uses Chromium though, so 🤷

Opera - 2% slower, not providing a screenshot as I don't consider it a supported platform. Uses Chromium, too, so 🤷

@Cadienvan
Copy link
Author

I guess Node acts differently than the V8 engine itself, it's the only thing I can think of. Now the question is: how impactful is the browser usage in such a library?

@LinusU
Copy link
Member

LinusU commented Jan 11, 2023

btw. I don't think that this is correct:

This can be more efficient than using the + operator for concatenation, because it avoids the need to allocate new memory and copy the strings. Instead, it modifies the existing string in-place.

I'm personally surprised that the new code is faster (if it is), since I'd assume that it'd be easier for the compiler/runtime to optimize the older code. Not that it shouldn't be able to optimize both codes into the same instructions for the computer...

That it isn't a question of modifying in place or not can be seen in this simple example:

> a = "hello"
'hello'
> b = a
'hello'
> b += ', world!'
'hello, world!'
> a
'hello'
> b
'hello, world!'

Strings are treated as "primitives" in JavaScript, which means that the runtime will have to use a "copy on write" scheme and keep track of which strings can be modified in place, and which must first be copied to a new location in the memory.

That being said, I believe that the only way to know is to measure so it doesn't really matter 😄


When seeing the initial post I was already a bit skeptical about the results/our benchmarks since some numbers that shouldn't have changed changed (e.g. parse) and some things that should have changed didn't (e.g. v3, v5 that got marginally slower). Maybe this is normal fluctuation though, but I wanted to measure just the stringifying.

So I brought out my favourite benchmark tool, hyperfine, and pasted the code into two separate files:

v1.js

// Code from "Setup JavaScript"

for (let i = 0; i < 4_000_000; i++) {
  unsafeStringifyV1(arr0)
  unsafeStringifyV1(arr1)
  unsafeStringifyV1(arr2)
  unsafeStringifyV1(arr3)
  unsafeStringifyV1(arr4)
  unsafeStringifyV1(arr5)
  unsafeStringifyV1(arr6)
  unsafeStringifyV1(arr7)
  unsafeStringifyV1(arr8)
  unsafeStringifyV1(arr9)
}

v2.js

// Code from "Setup JavaScript"

for (let i = 0; i < 4_000_000; i++) {
  unsafeStringifyV2(arr0)
  unsafeStringifyV2(arr1)
  unsafeStringifyV2(arr2)
  unsafeStringifyV2(arr3)
  unsafeStringifyV2(arr4)
  unsafeStringifyV2(arr5)
  unsafeStringifyV2(arr6)
  unsafeStringifyV2(arr7)
  unsafeStringifyV2(arr8)
  unsafeStringifyV2(arr9)
}

Then I ran this with Node.js v18.12.1 on my arm64 M1 MacBook Air:

$ hyperfine --warmup 1 'node v1.js' 'node v2.js'
Benchmark 1: node v1.js
  Time (mean ± σ):      5.424 s ±  0.202 s    [User: 5.264 s, System: 0.055 s]
  Range (min … max):    5.313 s …  5.993 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: node v2.js
  Time (mean ± σ):      5.964 s ±  0.007 s    [User: 5.794 s, System: 0.058 s]
  Range (min … max):    5.953 s …  5.974 s    10 runs
 
Summary
  'node v1.js' ran
    1.10 ± 0.04 times faster than 'node v2.js'

Note that there were statistical outliers even with a warmup. I believe that this is because I'm running this on my laptop on battery 😅

I did run this multiple time (changing 4_000_000 to different values, I started at 100_000 but the difference was too small, and 4_000_000 was the longest I had time to wait right now) and v1 was always faster.


Now the question is: how impactful is the browser usage in such a library?

It would be great to have some numbers, but I think that there are many many people using this in browsers. In fact, since all LTS versions of Node.js now have crypto.randomUUID (uuid v4, which I believe is the most popular) built in, then maybe even more people than use it in Node.js.


I also run the npm run test:benchmark scripts on my computer and this is the results:

Name Before change After change
uuid.stringify() x 4,494,323 ops/sec ±0.14% x 4,478,745 ops/sec ±0.16%
uuid.parse() x 4,021,155 ops/sec ±0.05% x 3,997,423 ops/sec ±0.15%
uuid.v1() x 4,612,946 ops/sec ±0.18% x 4,499,286 ops/sec ±0.16%
uuid.v1() fill existing array x 10,069,463 ops/sec ±0.13% x 10,048,499 ops/sec ±0.16%
uuid.v4() x 22,898,259 ops/sec ±1.66% x 22,940,018 ops/sec ±1.73%
uuid.v4() fill existing array x 8,156,566 ops/sec ±0.23% x 7,539,778 ops/sec ±0.19%
uuid.v3() x 560,691 ops/sec ±0.28% x 643,689 ops/sec ±0.44%
uuid.v5() x 665,117 ops/sec ±0.43% x 671,356 ops/sec ±0.42%

Stringify is marginally slower, but then "uuid.v4() fill existing array" is a lot slower so that's a bit strange 🤔

And then v3 is a lot faster, and v5 is marginally faster, so that's very interesting 🤔

@Cadienvan
Copy link
Author

This is starting to become a scientific paper ahah I like it, it seems like even specific CPUs or combinations of specs are influencing this benchmark as running those again on my pc still seems to prove the changed version is faster.

@LinusU
Copy link
Member

LinusU commented Jan 11, 2023

This is starting to become a scientific paper ahah I like it, it seems like even specific CPUs or combinations of specs are influencing this benchmark as running those again on my pc still seems to prove the changed version is faster.

Haha, yeah, it's amazing!

Would you be able to run the hyperfine test that I specified above? I'm really interested in seeing what it yields on your computer!


I also found another possible code change whilst looking at this, and some very interesting behaviour regarding toLowerCase and Regex#test on V8! 👉 #677

@broofa
Copy link
Member

broofa commented Jan 11, 2023

Thanks @LinusU. As always, your insight is very helpful.

Note: I think you mixed up the output in your "simple example", above:

> a
'hello, world!' // <-- should be 'hello'
> b
'hello' // <-- should be 'hello, world!'

@LinusU
Copy link
Member

LinusU commented Jan 11, 2023

Note: I think you mixed up the output in your "simple example", above:

Argh, yeah I did, thanks for pointing it out. I originally wrote it wrong in my terminal, and then figured I'd just fix it in the terminal, but I should have re-run it! 😅

I've updated the comment!

@Cadienvan
Copy link
Author

Would you be able to run the hyperfine test that I specified above? I'm really interested in seeing what it yields on your computer!

@LinusU I'd love to, are you able to provide the exact gist you tested so we can do some tests?

@jeremyfiel
Copy link

@mcollina this would be a great stream topic!

@broofa
Copy link
Member

broofa commented Oct 11, 2023

@Cadienvan, my apologies for letting this stagnate. Having just had to revert a prior change to this code due to the impact on memory allocation, I'm going to say that the code as it stands is "good enough" and try to avoid further bit-swizzling on it.

@broofa broofa closed this Oct 11, 2023
@Cadienvan Cadienvan deleted the perf/stringify-improved branch October 12, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants