Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NODE-5363): defer byte slicing to utf8 decoding API in nodejs #585

Merged
merged 15 commits into from
Jul 3, 2023

Conversation

W-A-James
Copy link
Contributor

@W-A-James W-A-James commented Jun 22, 2023

Description

What is changing?

ByteUtils
  • Updated signature of ByteUtils.toUTF8
    • toUTF8: (buffer: Uint8Array) => string -> toUTF8: (buffer: Uint8Array, start: number, end: number) => string
  • Updated node implementation of ByteUtils.toUTF8 to take start and end parameters and pass them into Buffer.toString directly
  • Updated node implementation of ByteUtils.toUTF8 to take start and end parameters and pass them into Uint8Array.slice
deserializer
  • Removed unnecessary call to Buffer.subarray which was one source of performance degradation
Benchmarks
  • Added int32 and Long deserialization benchmarks
Performance

This PR is meant to address a performance regression that was introduced in v4.6.0 during the introduction of our ByteUtils functions.

The benchmark used to assess performance is the deserialize a large batch of documents each with an array of many strings benchmark in etc/benchmarks/main.js. The benchmark was run 100 times on v4.6, v5.3(latest as at time of writing) and with the changes made in this PR.

The chart below shows the percent increase in runtime (lower is better) over v4.6. It clearly shows the difference in performance between v5.3 and v4.6 with the local column displaying marked improvement over v5.3. Note that there are still performance gains to be made before we are once again on par with v4.6.

NODE-5358-performance

Is there new documentation needed for these changes?

No

What is the motivation for this change?

Increasing bson and driver performance

Release Highlight

Improved BSON UTF8 Decoding Performance

In the v5 major release of BSON we internally abstracted the different byte manipulation APIs used based on whether the library is running in Node.js or in a browser. This abstraction required us to create a subarray before invoking the environment's UTF8 decoding API. Creating the subarray before invoking Node.js' Buffer.prototype.toString API turns out to cause an unnecessary slow down (See #585 for our research).

Double check the following

  • Ran npm run lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

Base automatically changed from improve-benchmark-tooling to main June 23, 2023 10:53
@W-A-James W-A-James marked this pull request as ready for review June 23, 2023 20:19
@billouboq
Copy link
Contributor

Hi @W-A-James,

Is there any way we can help you on those performance changes ?
Like is there any other things to look for ?

@durran durran self-assigned this Jun 27, 2023
@durran durran added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jun 27, 2023
src/utils/node_byte_utils.ts Outdated Show resolved Hide resolved
@W-A-James
Copy link
Contributor Author

Is there any way we can help you on those performance changes ?
Like is there any other things to look for ?

Hi @billouboq, thanks for your interest in this work! The work in this PR specifically is focused on string deserialization, but we are interested in investigating more about bson performance in general. We would be interested in investigating:

  1. string serialization performance
  2. ObjectId serialization and deserialization performance
  3. Int32 serialization and deserialization performance
  4. Double serialization and deserialization performance
  5. Long serialization and deserialization performance
  6. Nested document serialization and deserialization performance

Any investigation into any one of the areas listed above would be very useful. If you identify any new areas that could be improved and have suggestions on how to do so, filing a ticket on the NODE project with your findings would be the best way to inform us about your findings and allow us to take action on them.

As for the benchmarks we currently have, they live in etc/benchmarks/main.mjs. We currently have benchmarks for

  • string deserialization (see the 'deserialized a large batch of documents each with an array of many
    strings' benchmark)
  • Double deserialization (see the 'Many Doubles deserialization' benchmark)

This pr adds similar benchmarks for Int32 and Long deserialization.
The workflow (which itself could use improvement) for running these benchmarks is as follows:

  1. Set the skip flag to false on the benchmarks you are interested in running
  2. Set the iteration count variable in etc/benchmarks/main.mjs to an appropriate value (we default
    to 100)
  3. Check etc/benchmarks/bson_versions.json to ensure that the bson versions you are interested in
    comparing are present. If only interested in testing a local copy, then you can remove the
    entries from the array.
  4. Run ./etc/benchmarks/install_bson_versions.sh
  5. Run node etc/benchmarks/main.mjs | tee <results file> to put the statistics collected into a file. Note that this will also emit cpu profiles for each of the version-test pairs specified
  6. Use etc/benchmarks/convert_to_csv.sh to convert the results to a csv

src/utils/byte_utils.ts Outdated Show resolved Hide resolved
src/binary.ts Outdated Show resolved Hide resolved
@W-A-James W-A-James requested a review from durran June 30, 2023 19:42
@W-A-James W-A-James changed the title fix(NODE-5358): implement bson perf improvements fix(NODE-5363): implement bson perf improvements Jun 30, 2023
@durran durran requested a review from nbbeeken July 3, 2023 11:22
@durran durran removed the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jul 3, 2023
@durran durran added the Team Review Needs review from team label Jul 3, 2023
@nbbeeken nbbeeken added the Blocked Blocked on other work label Jul 3, 2023
@nbbeeken nbbeeken removed the Blocked Blocked on other work label Jul 3, 2023
@nbbeeken nbbeeken changed the title fix(NODE-5363): implement bson perf improvements fix(NODE-5363): defer byte slicing to utf8 decoding API in nodejs Jul 3, 2023
@durran durran merged commit e087042 into main Jul 3, 2023
@durran durran deleted the NODE-5358/implement-bson-perf-improvements branch July 3, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants