Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop base64-js dependency, and optimise Buffer.write #349

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

chjj
Copy link
Contributor

@chjj chjj commented Feb 4, 2024

Most string write functions (with the exception of hexWrite) were allocating arrays before copying them onto the buffer. This resulted in significant performance overhead.

This PR modifies said functions to write directly to the buffer. In addition, it drops the dependency on base64-js and adds support for the base64url encoding.

The numbers are as follows:

$ node perf/write-base64.js
OldBuffer.byteLength(8192, "base64") x 24,009 ops/sec ±0.55% (93 runs sampled)
NewBuffer.byteLength(8192, "base64") x 9,548,372 ops/sec ±0.18% (95 runs sampled)

OldBuffer#write(8192, "base64") x 19,789 ops/sec ±0.49% (95 runs sampled)
NewBuffer#write(8192, "base64") x 22,646 ops/sec ±0.04% (95 runs sampled)

OldBuffer#toString(8192, "base64") x 6,686 ops/sec ±0.39% (87 runs sampled)
NewBuffer#toString(8192, "base64") x 10,735 ops/sec ±0.35% (90 runs sampled)

$ node perf/write-binary.js
OldBuffer#write(8192, "binary") x 13,662 ops/sec ±1.37% (84 runs sampled)
NewBuffer#write(8192, "binary") x 46,397 ops/sec ±0.64% (91 runs sampled)

$ node perf/write-ucs2.js
OldBuffer#write(8192, "ucs2") x 3,537 ops/sec ±0.70% (86 runs sampled)
NewBuffer#write(8192, "ucs2") x 40,422 ops/sec ±1.14% (93 runs sampled)

$ node perf/write-utf8.js
OldBuffer.byteLength(2898, "utf8") x 15,304 ops/sec ±0.87% (85 runs sampled)
NewBuffer.byteLength(2898, "utf8") x 58,197 ops/sec ±0.05% (86 runs sampled)

OldBuffer#write(2898, "utf8") x 13,565 ops/sec ±0.82% (85 runs sampled)
NewBuffer#write(2898, "utf8") x 32,716 ops/sec ±0.25% (93 runs sampled)

There are some massive speedups here, particularly with asciiWrite, ucs2Write, and utf8Write.

I was disappointed to see base64 only has a marginal speedup, however, there is a significant speedup for base64Slice and byteLength('base64'). Likewise, byteLength('utf8') also received a significant speedup.

@chjj
Copy link
Contributor Author

chjj commented Feb 4, 2024

Added a new commit to further optimize base64Write. This logic is basically how the node.js c++ code works internally. It requires no string preprocessing (str.replace, str.split, str.indexOf, etc) which gives it an extra perf boost.

New Numbers:

OldBuffer#write(8192, "base64") x 19,789 ops/sec ±0.49% (95 runs sampled)
NewBuffer#write(8192, "base64") x 24,542 ops/sec ±0.02% (97 runs sampled)

@dcousens
Copy link
Collaborator

dcousens commented Feb 4, 2024

@chjj can you break this into two pull requests?
The addition of all the new benchmarks (easy merge), and then an isolated pull request for the actual behavioural changes?

@chjj chjj force-pushed the optimize-string-decoding branch from b6021f4 to 42bc6ba Compare February 5, 2024 03:06
@chjj
Copy link
Contributor Author

chjj commented Feb 5, 2024

Done. Force pushed so this is now the isolated PR.

edit: I can rebase on master and force push again once you merge #352 if need be.

@dcousens dcousens changed the title Optimize string decoding (among other things) Drop base64-js dependency, and optimise Buffer.write Feb 5, 2024
@dcousens dcousens self-requested a review February 5, 2024 04:22
@chjj chjj force-pushed the optimize-string-decoding branch from 42bc6ba to 5be2b22 Compare February 5, 2024 12:39
@chjj
Copy link
Contributor Author

chjj commented Feb 5, 2024

New base64 implementation using atob/btoa pushed.

@chjj chjj force-pushed the optimize-string-decoding branch from 5be2b22 to 61582b4 Compare February 5, 2024 12:53
Comment on lines +75 to +87
if (typeof global !== 'undefined' && global && global.Math === Math &&
typeof global.Buffer === 'function' && global.Buffer.prototype &&
typeof global.Buffer.prototype.asciiSlice === 'function') {
const NodeBuffer = global.Buffer

_atob = function atob (str) {
return NodeBuffer.from(str, 'base64').toString('binary')
}

_btoa = function btoa (str) {
return NodeBuffer.from(str, 'binary').toString('base64')
}
}
Copy link
Contributor Author

@chjj chjj Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shim leads to some untested code paths when testing in node.js (e.g. atob won't throw on a base64url string like it should).

Unfortunately there's no feature in tape to set a global or somehow signal to this piece of code that we are testing and it should use the atob/btoa functions provided by node.js instead of shimming.

I see a few solutions:

  1. Create tape-wrapper.js which does something like: global.BUFFER_TEST = true; require('tape/bin/tape'); and add && !global.BUFFER_TEST to the already-gnarly if statement. Run tests with ./tape-wrapper.js instead of tape.
  2. Rewrite the tests in mocha which allows you to assign globals on boot.
  3. Only shim atob/btoa if typeof atob === 'undefined'. This fixes the tests and maintains functionality for pre-16.0.0 nodes, but creates a weird situation where older versions of node are actually faster than newer ones.
  4. Remove this entire thing. Always use atob/btoa. Tests will be fixed. Older nodes will break. Newer nodes will be slow. Benchmarks run inside node.js will be inaccurate.
  5. Beseech the node.js devs to fix atob/btoa which eventually does away with all the problems I just listed.
  6. Switch back to the custom base64 implementation which is almost as fast as atob in browsers and replicates the exact node.js behavior with no downgrade in perf on base64url etc.

Copy link
Contributor Author

@chjj chjj Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just occurred to me that we could do something hacky like checking:

&& global.process.argv[1].slice(-4) !== 'tape'

to exclude testing from the shims.

@chjj
Copy link
Contributor Author

chjj commented Feb 5, 2024

After thinking about it, all of this atob/btoa inconsistency is really bugging me. I'm considering switching back to the custom base64 impl. Although harder to review, it provides consistent behavior that we know will work in everything and has predictable performance.

@chjj
Copy link
Contributor Author

chjj commented Feb 7, 2024

@dcousens, I think I will switch this PR back to my custom base64 implementation.

The more I think about it, the more I don't like atob/btoa for the following reasons:

  1. It's hard to benchmark due to the node.js issue. The numbers may vary significantly in different browsers as well.
  2. It screws up the tests if we want to shim in atob/btoa for node.js.
  3. The node.js shim also causes issues because node.js has a history of breaking their base64 parser.
  4. With regards to number 3, same could be true of atob in any javascript environment: we don't know whether that environment has some kind of bug/inconsistency with their base64 parser. We also don't know the performance properties of that environment's base64 decoder.
  5. With atob, buf.write('base64') is basically guaranteed to be slower on a lot of base64url strings or a string with invalid characters because it has to run base64clean and retry parsing.
  6. Using btoa is just straight up slower than our custom implementation in any environment (probably).
  7. There are environments you could bundle for that do not have atob/btoa. An example of this is quickjs. If I want to write some javascript code and compile it to a binary that runs everywhere, quickjs combined with cosmopolitan is the answer. By using atob/btoa, we break Buffer in that environment.
  8. It goes against the whole spirit of this PR which is to write bytes directly to the buffer during decoding.

Number 7 is one I recently thought of. I really really like the idea of compiling bundles for quickjs (I'm thinking of adding a quickjs target to my bundler to convert the quickjs api to the node.js api, but that's a different story).

The benefits of a custom implementation are as follows:

  1. Predictable performance.
  2. Predictable behavior.
  3. Faster toString.
  4. Supports any javascript environment, including things like quickjs and the d8 shell.
  5. Less fussing over all of the stuff above.

The downsides:

  1. Slower buf.write(str, 'base64') in environments like chrome (but again, chrome would only be significantly faster if the string is well-formed enough such that atob can parse it).
  2. Larger code size, harder to review.

@dcousens
Copy link
Collaborator

dcousens commented Feb 7, 2024

I understand your plight for each of the points, but I don't know if this library should be a reference for this kind of functionality.
This library should be using whatever underlying native functions exist, as that is what users expect.

The primary reason this library exists is to support cross-compatibility of the native API's, not to be an improved implementation of their shared functionality.

With atob, buf.write('base64') is basically guaranteed to be slower on a lot of base64url strings or a string with invalid characters because it has to run base64clean and retry parsing.

I don't understand this, we should know that the input string is base64url no?

This pull request is difficult to quickly review in the limited time I have, when you have touched and moved the utf8Write and hexWrite and ucs2Write and... can we reduce that diff?

@chjj
Copy link
Contributor Author

chjj commented Feb 7, 2024

This library should be using whatever underlying native functions exist, as that is what users expect.

Fair enough.

I don't understand this, we should know that the input string is base64url no?

We can use that as a hint, but the Buffer.from('base64') in node has supported the base64url character set for a long time. It's not always guaranteed that a user would pass in base64 versus base64url.

This pull request is difficult to quickly review in the limited time I have, when you have touched and moved the utf8Write and hexWrite and ucs2Write and... can we reduce that diff?

Yeah, I know I've been bugging you a lot lately. Sorry about that. Thank you for your attentiveness this past week.

I find it's best to look at the diff of each individual commit, but I see your point. I'll try to get it looking better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants