Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: improve base64 encoding performance #39701

Closed
wants to merge 1 commit into from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Aug 8, 2021

Benchmark results on a Core i7-3770K:

                                                   confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-base64-encode.js n=32 len=67108864        ***     15.94 %       ±0.11% ±0.14% ±0.18%

I was tempted to run the benchmark on ARM out of curiosity, but I don't have a machine with a new enough build environment or one that I can easily swap the OS at the moment and trying to cross compile node.js with a Linaro aarch64 toolchain is impossible.

@mscdex mscdex added the performance Issues and PRs related to the performance of Node.js. label Aug 8, 2021
@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Aug 8, 2021
@mscdex mscdex removed the needs-ci PRs that need a full CI run. label Aug 8, 2021
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvement (sure you tested with -O3 in both cases right?) is somewhat surprising but sure.

@mscdex
Copy link
Contributor Author

mscdex commented Aug 8, 2021

sure you tested with -O3 in both cases right?

-O3 is the default in node builds, which I did not touch.

is somewhat surprising

I'm not as surprised considering this is reading more bytes per loop iteration (and in one go) than before, which is not something I'm sure a compiler would necessarily know to do.

@benjamingr
Copy link
Member

Yeah I of course believe you and assumed it's run with -O3 (otherwise I wouldn't have LGTMd) I'm just surprised about:

which is not something I'm sure a compiler would necessarily know to do.

It sounds like a pretty safe and straightforward form of loop unrolling which is a fairly simple/standard optimisation


// Read in chunks of 8 bytes for as long as possible
while (i < n64) {
const uint64_t dword = *reinterpret_cast<const uint64_t*>(src + i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some architectures don't support unaligned 64-bit access, so this could end up being somewhat slower on those. But still an optimization on most systems!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I was curious to check on ARM. However from what I'm seeing ARMv7 and newer support unaligned reads, with ARMv7 supposedly not supporting them for only 2 instructions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to start reading 1/4 bytes until 8 byte alignment is reached?

@mscdex
Copy link
Contributor Author

mscdex commented Aug 11, 2021

Alright, after more wrangling than I wanted, I managed to get some armv7 binaries compiled and ran the benchmarks on a Pi 2B before and after the changes in this PR:

                                                  confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-base64-encode.js n=32 len=67108864        ***      2.20 %       ±0.39% ±0.52% ±0.68%

I don't have an armv8 board available to test on.

@Mesteery
Copy link
Contributor

I can run the benchmark on a Pi 4B. Should I compare with master or 16.6.1?

@Mesteery
Copy link
Contributor

compared with 16.6.1 :

                                                  confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-base64-encode.js n=32 len=67108864        ***      5.12 %       ±0.22% ±0.30% ±0.39%

@fhinkel
Copy link
Member

fhinkel commented Aug 17, 2021

Can you link the issue where this is too slow in the real world? Not sure 15% perf improvement on a synthetic benchmark warrants this change.

@mscdex
Copy link
Contributor Author

mscdex commented Aug 17, 2021

Not sure 15% perf improvement on a synthetic benchmark warrants this change.

"Synthetic" benchmarks are all we have at the moment, after the benchmarking WG was dechartered/decommissioned. IMO base64 encoding/decoding is not something that I'd exactly call an uncommon task, especially within the realm of web applications.

In general, any sensible changes that provide a measurable improvement should be welcomed, especially considering the number of performance hits we continue to take on over time as features get added and as V8 evolves. The improvements all add up.

Anyway, as far as base64 encoding goes, I also have an alternative PR here that you may be more interested in. 🤷‍♂️

@mscdex
Copy link
Contributor Author

mscdex commented Jul 10, 2022

Closing in favor of #39775

@mscdex mscdex closed this Jul 10, 2022
@mscdex mscdex deleted the base64-perf branch July 14, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants