Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: significantly improve loader performance #26970

Closed
wants to merge 5 commits into from

Conversation

BridgeAR
Copy link
Member

@BridgeAR BridgeAR commented Mar 28, 2019

This is a significant performance boost for require in case a module is loaded more than once. This came up as a botteneck for @jasnell and @mcollina as far as I know.

It is now possible to use require lazily without sacrificing much performance. Loading files with relative paths is now up to 4-5x faster when loaded frequently!

Loading relative or absolute paths now has the same performance profile. The native startup will likely not be impacted.

I introduced a new cache which sits on top of the former main cache and caches the filename of already loaded modules. This is then used to check for lazy require calls and for identical require calls from different files in the same directory.

Refs: #25362

Benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/311/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot
Copy link
Collaborator

@BridgeAR Sadly, an error occurred when I tried to trigger a build. :(

@nodejs-github-bot
Copy link
Collaborator

@BridgeAR BridgeAR added module Issues and PRs related to the module subsystem. benchmark Issues and PRs related to the benchmark subsystem. labels Mar 28, 2019
@BridgeAR
Copy link
Member Author

@nodejs/modules PTAL

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only concern here is cases like the following where you have say x.json:

require('./x'); // resolves x.json
fs.writeFileSync('./x.js', 'module.exports = "new"');
require('./x'); // should resolve x.js

as surely the above case would get the JSON file again instead of the x.js contents?

No one will actually do that though, so this is fine, but in theory that is a break from semantics right?

Similarly if the relative resolution was running through package.json "main" or a directory index check, these would also be cached, with similar risks to the above right?

If this is done for package resolutions eg require('y') then we're basically just caching the resolver, which may not be intended. The restriction of the cache to the local directory sort of restricts the timeframe to make it feel like its correct, when in theory you could load a subpath of a package later. For example say require('pkg/index.js') loads require('dep') where require('pkg/index2.js') is then loaded much later might expect a different value for require('dep'). At least according to the current semantics that would be valid for require('dep') to be something else if the package.json changed. Again, no one would do that, but it's still not quite the same.

So for those sorts of reasons it worries me I must admit... it doesn't feel like this performance comes completely for free, although it certainly is close enough to free to feel like it could be worthwhile.

lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
@guybedford
Copy link
Contributor

Correction - my mentions to package.json changes above don't apply due to the package.json cache, but everything else from a statting perspective does apply (extensions checks, node_modules lookups).

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, the edge cases are so well designed in this approach that they will never be hit. There are some liberties, but they seem to work out. The perf definitely seems worth it. Sorry I can't be more enthusiastic, but I've been stuck on semantics too long.

@BridgeAR
Copy link
Member Author

@guybedford

everything else from a statting perspective does apply (extensions checks, node_modules lookups)

We also have the stat cache and that should cause the already resolved file from the first require call to be loaded.

the edge cases are so well designed in this approach that they will never be hit.

Thanks, that was my intention.

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 29, 2019
@nodejs-github-bot
Copy link
Collaborator

@guybedford
Copy link
Contributor

guybedford commented Mar 29, 2019

We also have the stat cache and that should cause the already resolved file from the first require call to be loaded.

Ok, so if the example was rather:

  • node_modules/dep does exist
  • lib/node_modules/dep does not exist
let dep = require('dep');
fs.mkdirSync('./lib/node_modules/dep');
fs.writeFileSync('./lib/node_modules/dep/index.js');
exports.reloadDep = function () {
  dep = require('dep');
}
setTimeout(() => exports.reloadDep());

where there was an expectation that lib/node_modules/dep could be written and then reloaded as above. So lazy require cache I guess as you say.

var cachedModule = Module._cache[filename];
if (cachedModule) {
const cachedModule = Module._cache[filename];
if (cachedModule !== undefined) {
updateChildren(parent, cachedModule, true);
return cachedModule.exports;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done at the Module._resolveFilename level?
A cache key could be something like

cacheKey =
  request + "\0" +
  dirPath + "\0" +
  (isMain ? "1" : "")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would not be beneficial as far as I can tell. I would also rather not move any parts around since the loader is highly monkey patched and every subtle change can cause side-effects.

isMain should not have any impact on the key as it's only about the very first require call and that is not required for the cache key.

Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.
@BridgeAR BridgeAR force-pushed the improve-loader-performance branch from c7c034d to 9137c56 Compare March 31, 2019 10:59
@BridgeAR
Copy link
Member Author

Rebased due to conflicts.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green CITGM

@BridgeAR
Copy link
Member Author

BridgeAR commented Apr 3, 2019

BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@BridgeAR
Copy link
Member Author

BridgeAR commented Apr 4, 2019

Thanks for the reviews.

Landed in 3b04496...0f190a5 🎉

@BridgeAR BridgeAR closed this Apr 4, 2019
Trott added a commit to Trott/io.js that referenced this pull request Apr 4, 2019
A recent commit broke test-benchmark-module. This fixes it.

Culprit is nodejs#26970.
@Trott Trott mentioned this pull request Apr 4, 2019
2 tasks
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
@BethGriggs BethGriggs mentioned this pull request Apr 9, 2019
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
@BridgeAR BridgeAR added the performance Issues and PRs related to the performance of Node.js. label May 8, 2019
@BridgeAR BridgeAR deleted the improve-loader-performance branch January 20, 2020 11:59
aduh95 added a commit to aduh95/node that referenced this pull request May 9, 2020
@aduh95 aduh95 mentioned this pull request May 9, 2020
3 tasks
addaleax pushed a commit that referenced this pull request May 19, 2020
Refs: #26970
Fixes: #33270

PR-URL: #33323
Reviewed-By: Anna Henningsen <anna@addaleax.net>
codebytere pushed a commit that referenced this pull request May 21, 2020
Refs: #26970
Fixes: #33270

PR-URL: #33323
Reviewed-By: Anna Henningsen <anna@addaleax.net>
codebytere pushed a commit that referenced this pull request Jun 7, 2020
Refs: #26970
Fixes: #33270

PR-URL: #33323
Reviewed-By: Anna Henningsen <anna@addaleax.net>
codebytere pushed a commit that referenced this pull request Jun 9, 2020
Refs: #26970
Fixes: #33270

PR-URL: #33323
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. benchmark Issues and PRs related to the benchmark subsystem. module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants