Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src,lib: reducing C++ calls of esm legacy main resolve #48325

Merged
merged 2 commits into from
Jul 3, 2023

Conversation

H4ad
Copy link
Member

@H4ad H4ad commented Jun 3, 2023

This PR is related to nodejs/performance#73

Legacy Main Resolve

@anonrig raised that we could benefit from the implementation of legacyMainResolve on the C++ side, so here's the result so far:

                                                                                                                                                                confidence improvement accuracy (*)
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000              ***     34.86 %       ±1.41%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                        ***     34.92 %       ±1.52%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000          ***     34.53 %       ±1.22%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                    ***     35.39 %       ±1.57%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***     34.78 %       ±1.30%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***     35.39 %       ±1.15%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000        ***     34.75 %       ±1.43%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                  ***     35.84 %       ±1.33%
                                                                                                                                                                  (**)  (***)
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000       ±1.88% ±2.45%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/non-exist' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000                 ±2.02% ±2.64%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000   ±1.62% ±2.12%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.js' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000             ±2.09% ±2.74%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000 ±1.74% ±2.27%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.json' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000           ±1.53% ±1.99%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='./index.js' packageJsonUrl='node_modules/test/package.json' n=10000 ±1.91% ±2.48%
esm/esm-legacyMainResolve.js resolvedFile='node_modules/test/index.node' packageConfigMain='' packageJsonUrl='node_modules/test/package.json' n=10000           ±1.77% ±2.31%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 8 comparisons, you can thus
expect the following amount of false-positive results:
  0.40 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.08 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

To be able to run the benchmark, first build the old node using this branch.

About the benchmark, I needed to create another benchmark just for this function to be able to see the improvements since the results of esm-defaultResolver.js was not reflecting the real impact of this change.

Potential optimizations

legacyMainResolve also calls fileURLToPath and didn't have any version on the C++ side, so I needed to rewrite this function.

I haven't benchmarked the JS version yet, but I think we can add some benchmarks and see if we can replace the JS version with the C++ version, I've tried to make the code as easy as possible to support exposing on the JS side in the future.

Tasks

  • Rewrite legacyMainResolve
  • Rewrite fileURLToPath
  • Add benchmark for legacyMainResolve
  • Add tests for legacyMainResolve

Acknowledgments

Huge thanks to @anonrig for the support and early review of this code and @RafaelGSS for the hints about the benchmark and permission model.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Jun 3, 2023
@H4ad H4ad force-pushed the perf/legacy-main-resolve branch from 95e1aa7 to 55dd4d9 Compare June 3, 2023 19:43
Copy link
Member

@RafaelGSS RafaelGSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the new implementation is prone to memory leaks and it's quite complex to read

src/node_errors.h Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated
THROW_IF_INSUFFICIENT_PERMISSIONS(
env, permission::PermissionScope::kFileSystemRead, file_path);

if (UNSAFE_FileExist(env, file_path)) return args.GetReturnValue().Set(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it block the event loop?

RafaelGSS

This comment was marked as duplicate.

lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
@H4ad H4ad force-pushed the perf/legacy-main-resolve branch 2 times, most recently from 232d48c to 072c7d2 Compare June 4, 2023 03:21
src/node_errors.h Show resolved Hide resolved
src/node_errors.h Show resolved Hide resolved
src/node_file.cc Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated
#endif
}

FilePathIsFileReturnType FilePathIsFile(Environment* env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be the same as

static void InternalModuleStat(const FunctionCallbackInfo<Value>& args) {

maybe we could DRY the implementation of both here? (this not really blocking just a observation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I was thinking of doing that but I cannot join both functions since I don't know how I can represent the state of the THROW_IF_INSUFFICIENT_PERMISSIONS, when the user doesn't have permissions, we should stop the execution of the function, so I need to return not only an int but also another state to tell the caller that the user doesn't have permission.

Because of this, I prefer to return an enum for FilePathIsFile and not over complicate both implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RafaelGSS might help

@aduh95
Copy link
Contributor

aduh95 commented Jun 4, 2023

legacyMainResolve also calls fileURLToPath and didn't have any version on the C++ side, so I needed to rewrite this function.

Maybe we should first test/benchmark and land the fileURLToPath rewrite, which would simplify this PR. wdyt?

@H4ad H4ad force-pushed the perf/legacy-main-resolve branch from 072c7d2 to dbacd5f Compare June 4, 2023 15:33
@H4ad
Copy link
Member Author

H4ad commented Jun 4, 2023

@aduh95 Here's the benchmark for the rewrite of fileURLToPath:

IMPORTANT: Those values are not related to this PR, those numbers are only specific to the branch linked above the benchmark.

                                                                                         confidence improvement accuracy (*)    (**)   (***)
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test/index.js'                    ***    -95.88 %       ±3.25%  ±4.44%  ±6.03%
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test%20index.js'                  ***    -95.19 %       ±2.98%  ±4.07%  ±5.52%
url/file-url-to-path.js n=100000 url='(url) file:///home/user/test%2Findex.js'                  ***      8.66 %       ±2.71%  ±3.67%  ±4.90%
url/file-url-to-path.js n=100000 url='(url) file://google.com/home/user/test%2Findex.js'                 0.68 %       ±2.75%  ±3.72%  ±4.99%
url/file-url-to-path.js n=100000 url='(url) http://google.com/home/test.js'                     ***    -11.02 %       ±2.27%  ±3.05%  ±4.03%
url/file-url-to-path.js n=100000 url='file:///home/user/test/index.js'                          ***     97.59 %       ±7.74% ±10.45% ±13.92%
url/file-url-to-path.js n=100000 url='file:///home/user/test%20index.js'                        ***     86.08 %       ±7.51% ±10.15% ±13.52%
url/file-url-to-path.js n=100000 url='file:///home/user/test%2Findex.js'                        ***    195.47 %       ±3.95%  ±5.35%  ±7.14%
url/file-url-to-path.js n=100000 url='file://google.com/home/user/test%2Findex.js'              ***    166.80 %       ±3.19%  ±4.29%  ±5.66%
url/file-url-to-path.js n=100000 url='http://google.com/home/test.js'                           ***    145.18 %       ±4.26%  ±5.76%  ±7.68%
url/file-url-to-path.js n=100000 url='not-even-a-url'                                           ***    168.09 %       ±7.25%  ±9.87% ±13.32%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 11 comparisons, you can thus
expect the following amount of false-positive results:
  0.55 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.11 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

Branch: main...h4ad-forks:node:perf/file-url-to-path

In resume, if we want to expose the C++ function, we should enforce the caller to always pass a string instead of URL.

@aduh95
Copy link
Contributor

aduh95 commented Jun 4, 2023

@H4ad I meant to have it as a separate PR.

lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/resolve.js Outdated Show resolved Hide resolved
lib/internal/modules/esm/resolve.js Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
@H4ad
Copy link
Member Author

H4ad commented Jun 4, 2023

@aduh95 Yeah, my idea was to show you if you really want to do that with the performance benefits from it.

To be honest, in this form, without refactoring the whole ESM to avoid new URL and manipulate the data directly as a string, we will not see a performance improvement, it will be the complete opposite.

I think is better to go in the opposite direction, merge this one and then evaluate the benefits of exposing fileURLToPath, because as I said, will probably lead to a refactoring not only in the function itself but also in the places where they call to take the fastest path, which is manipulating directly the string.

@GeoffreyBooth
Copy link
Member

url/file-url-to-path.js n=100000 url=’(url) file:///home/user/test/index.js’ *** -95.88 % ±3.25% ±4.44% ±6.03%

This seems to show that in the most common case of a file:// URL, this is 96% slower?

@H4ad
Copy link
Member Author

H4ad commented Jun 6, 2023

@GeoffreyBooth Exactly, the way the fileURLToPath is used today, simply don't worth exposing the C++ version.

@GeoffreyBooth
Copy link
Member

@GeoffreyBooth Exactly, the way the fileURLToPath is used today, simply don’t worth exposing the C++ version.

Okay, so then should we close this PR? Is there something else of value being proposed here?

@H4ad
Copy link
Member Author

H4ad commented Jun 6, 2023

This PR is not about fileURLToPath, is about LegacyMainResolve.

The improvements for LegacyMainResolve is about 60~70%.

The rewrite of fileURLToPath was only because it was needed by LegacyMainResolve. Aduh suggested that maybe we could create a PR for fileURLToPath but as we've seen, it's not worth it, that's it.

Also, the metrics of #48325 (comment) is not related to this PR itself, are related to the branch I link in the comment, so those numbers don't reflect this PR since we don't have any overhead of calling C++ function from C++ side.

@H4ad H4ad force-pushed the perf/legacy-main-resolve branch from 9301722 to 42f276c Compare June 6, 2023 13:43
@H4ad
Copy link
Member Author

H4ad commented Jun 7, 2023

The CI for Test Asan is failing due some memory issue:

==88318==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd9ad085f8 at pc 0x00000431f995 bp 0x7ffd9ad07570 sp 0x7ffd9ad07568
READ of size 16 at 0x7ffd9ad085f8 thread T0
    #0 0x431f994 in ada::url_aggregator ada::parser::parse_url<ada::url_aggregator>(std::basic_string_view<char, std::char_traits<char> >, ada::url_aggregator const*) (/home/runner/work/node/node/out/Release/node+0x431f994)
    #1 0x4312e13 in tl::expected<ada::url_aggregator, ada::errors> ada::parse<ada::url_aggregator>(std::basic_string_view<char, std::char_traits<char> >, ada::url_aggregator const*) (/home/runner/work/node/node/out/Release/node+0x4312e13)
    #2 0xfc3b97 in node::fs::LegacyMainResolve(v8::FunctionCallbackInfo<v8::Value> const&) (/home/runner/work/node/node/out/Release/node+0xfc3b97)
    #3 0x1779bab in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) (/home/runner/work/node/node/out/Release/node+0x1779bab)
    #4 0x17779b4 in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (/home/runner/work/node/node/out/Release/node+0x17779b4)
    #5 0x3402e35 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit (/home/runner/work/node/node/out/Release/node+0x3402e35)

Address 0x7ffd9ad085f8 is located in stack of thread T0 at offset 1304 in frame
    #0 0xfc364f in node::fs::LegacyMainResolve(v8::FunctionCallbackInfo<v8::Value> const&) (/home/runner/work/node/node/out/Release/node+0xfc364f)

Ref: https://github.com/nodejs/node/actions/runs/5189308542/jobs/9354128411?pr=48325

Does anyone know what could be the cause?

I tried to reproduce it locally but I couldn't, so I have no idea how to fix it.

src/node_file.cc Outdated Show resolved Hide resolved
@H4ad
Copy link
Member Author

H4ad commented Sep 11, 2023

@ruyadorno Hey, can I help with something?

@RaisinTen
Copy link
Contributor

@H4ad
Copy link
Member Author

H4ad commented Sep 11, 2023

@RaisinTen Thanks, I'll try to backport by the end of this day.

ruyadorno pushed a commit that referenced this pull request Sep 13, 2023
PR-URL: #48664
Refs: #48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
Instead of many C++ calls, now we make only one C++ call
to return a enum number that represents the selected state.

Backport-PR-URL: nodejs#48325
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
Instead of many C++ calls, now we make only one C++ call
to return a enum number that represents the selected state.

Backport-PR-URL: nodejs#48325
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
Instead of many C++ calls, now we make only one C++ call
to return a enum number that represents the selected state.

Backport-PR-URL: nodejs#48325
H4ad added a commit to H4ad/node that referenced this pull request Sep 14, 2023
@RaisinTen RaisinTen added backport-open-v18.x Indicate that the PR has an open backport. and removed backport-requested-v18.x PRs awaiting manual backport to the v18.x-staging branch. labels Sep 14, 2023
ruyadorno pushed a commit that referenced this pull request Sep 17, 2023
PR-URL: #48664
Refs: #48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
H4ad added a commit to H4ad/node that referenced this pull request Sep 18, 2023
Instead of many C++ calls, now we make only one C++ call
to return a enum number that represents the selected state.

Backport-PR-URL: nodejs#48325
H4ad added a commit to H4ad/node that referenced this pull request Sep 18, 2023
H4ad added a commit to H4ad/node that referenced this pull request Sep 18, 2023
nodejs-github-bot pushed a commit that referenced this pull request Sep 29, 2023
PR-URL: #49688
Refs: #48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
alexfernandez pushed a commit to alexfernandez/node that referenced this pull request Nov 1, 2023
PR-URL: nodejs#49688
Refs: nodejs#48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
targos pushed a commit that referenced this pull request Nov 11, 2023
PR-URL: #49688
Refs: #48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
H4ad added a commit to H4ad/node that referenced this pull request Dec 2, 2023
Instead of many C++ calls, now we make only one C++ call
to return a enum number that represents the selected state.

Backport-PR-URL: nodejs#48325
H4ad added a commit to H4ad/node that referenced this pull request Dec 2, 2023
H4ad pushed a commit to H4ad/node that referenced this pull request Dec 3, 2023
PR-URL: nodejs#48664
Refs: nodejs#48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>

Backport-PR-URL: nodejs#48664
@richardlau richardlau removed the backport-open-v18.x Indicate that the PR has an open backport. label Mar 18, 2024
debadree25 pushed a commit to debadree25/node that referenced this pull request Apr 15, 2024
PR-URL: nodejs#49688
Refs: nodejs#48325
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. esm Issues and PRs related to the ECMAScript Modules implementation. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants