-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: significantly improve loader performance #26970
Conversation
@BridgeAR Sadly, an error occurred when I tried to trigger a build. :( |
@nodejs/modules PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only concern here is cases like the following where you have say x.json
:
require('./x'); // resolves x.json
fs.writeFileSync('./x.js', 'module.exports = "new"');
require('./x'); // should resolve x.js
as surely the above case would get the JSON file again instead of the x.js contents?
No one will actually do that though, so this is fine, but in theory that is a break from semantics right?
Similarly if the relative resolution was running through package.json "main" or a directory index check, these would also be cached, with similar risks to the above right?
If this is done for package resolutions eg require('y')
then we're basically just caching the resolver, which may not be intended. The restriction of the cache to the local directory sort of restricts the timeframe to make it feel like its correct, when in theory you could load a subpath of a package later. For example say require('pkg/index.js') loads require('dep') where require('pkg/index2.js') is then loaded much later might expect a different value for require('dep'). At least according to the current semantics that would be valid for require('dep') to be something else if the package.json changed. Again, no one would do that, but it's still not quite the same.
So for those sorts of reasons it worries me I must admit... it doesn't feel like this performance comes completely for free, although it certainly is close enough to free to feel like it could be worthwhile.
Correction - my mentions to package.json changes above don't apply due to the package.json cache, but everything else from a statting perspective does apply (extensions checks, node_modules lookups). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, the edge cases are so well designed in this approach that they will never be hit. There are some liberties, but they seem to work out. The perf definitely seems worth it. Sorry I can't be more enthusiastic, but I've been stuck on semantics too long.
We also have the stat cache and that should cause the already resolved file from the first require call to be loaded.
Thanks, that was my intention. |
Ok, so if the example was rather:
let dep = require('dep');
fs.mkdirSync('./lib/node_modules/dep');
fs.writeFileSync('./lib/node_modules/dep/index.js');
exports.reloadDep = function () {
dep = require('dep');
}
setTimeout(() => exports.reloadDep()); where there was an expectation that |
var cachedModule = Module._cache[filename]; | ||
if (cachedModule) { | ||
const cachedModule = Module._cache[filename]; | ||
if (cachedModule !== undefined) { | ||
updateChildren(parent, cachedModule, true); | ||
return cachedModule.exports; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done at the Module._resolveFilename
level?
A cache key could be something like
cacheKey =
request + "\0" +
dirPath + "\0" +
(isMain ? "1" : "")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would not be beneficial as far as I can tell. I would also rather not move any parts around since the loader is highly monkey patched and every subtle change can cause side-effects.
isMain
should not have any impact on the key as it's only about the very first require call and that is not required for the cache key.
Add more benchmark options to properly verify the gains. This makes sure the benchmark also tests requiring the same module again instead of only loading each module only once.
Moving `try / catch` into separate functions is not necessary anymore due to V8 optimizations.
This adds the `path` property to the module object. It contains the current directory as path. That is necessary to add an extra caching layer. It also makes sure the `id` uses a default in case it's not set. Otherwise the `path.dirname(id)` command could fail.
This adds an extra modules caching layer that operates on the parent's `path` property and the current require argument. That together can be used as unique identifier to speed up loading the same module more than once. It is a cache on top of the current modules cache. It has the nice feature that this cache does not only work in the same file but it works for the whole current directory. So if the same file is loaded in any other file from the same directory, it will also hit this cache instead of having to resolve the file again. To keep it backwards compatible with the old modules cache, it detects invalidation of that cache.
c7c034d
to
9137c56
Compare
Rebased due to conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on green CITGM
PR-URL: nodejs#26970 Refs: nodejs#25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Moving `try / catch` into separate functions is not necessary anymore due to V8 optimizations. PR-URL: nodejs#26970 Refs: nodejs#25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
This adds the `path` property to the module object. It contains the current directory as path. That is necessary to add an extra caching layer. It also makes sure the `id` uses a default in case it's not set. Otherwise the `path.dirname(id)` command could fail. PR-URL: nodejs#26970 Refs: nodejs#25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
This adds an extra modules caching layer that operates on the parent's `path` property and the current require argument. That together can be used as unique identifier to speed up loading the same module more than once. It is a cache on top of the current modules cache. It has the nice feature that this cache does not only work in the same file but it works for the whole current directory. So if the same file is loaded in any other file from the same directory, it will also hit this cache instead of having to resolve the file again. To keep it backwards compatible with the old modules cache, it detects invalidation of that cache. PR-URL: nodejs#26970 Refs: nodejs#25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Thanks for the reviews. Landed in 3b04496...0f190a5 🎉 |
A recent commit broke test-benchmark-module. This fixes it. Culprit is nodejs#26970.
Add more benchmark options to properly verify the gains. This makes sure the benchmark also tests requiring the same module again instead of only loading each module only once. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
This adds the `path` property to the module object. It contains the current directory as path. That is necessary to add an extra caching layer. It also makes sure the `id` uses a default in case it's not set. Otherwise the `path.dirname(id)` command could fail. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
This adds an extra modules caching layer that operates on the parent's `path` property and the current require argument. That together can be used as unique identifier to speed up loading the same module more than once. It is a cache on top of the current modules cache. It has the nice feature that this cache does not only work in the same file but it works for the whole current directory. So if the same file is loaded in any other file from the same directory, it will also hit this cache instead of having to resolve the file again. To keep it backwards compatible with the old modules cache, it detects invalidation of that cache. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
Add more benchmark options to properly verify the gains. This makes sure the benchmark also tests requiring the same module again instead of only loading each module only once. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
This adds the `path` property to the module object. It contains the current directory as path. That is necessary to add an extra caching layer. It also makes sure the `id` uses a default in case it's not set. Otherwise the `path.dirname(id)` command could fail. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
This adds an extra modules caching layer that operates on the parent's `path` property and the current require argument. That together can be used as unique identifier to speed up loading the same module more than once. It is a cache on top of the current modules cache. It has the nice feature that this cache does not only work in the same file but it works for the whole current directory. So if the same file is loaded in any other file from the same directory, it will also hit this cache instead of having to resolve the file again. To keep it backwards compatible with the old modules cache, it detects invalidation of that cache. PR-URL: #26970 Refs: #25362 Reviewed-By: Guy Bedford <guybedford@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Signed-off-by: Beth Griggs <Bethany.Griggs@uk.ibm.com>
Refs: nodejs#26970 Fixes: nodejs#33270
This is a significant performance boost for require in case a module is loaded more than once. This came up as a botteneck for @jasnell and @mcollina as far as I know.
It is now possible to use require lazily without sacrificing much performance. Loading files with relative paths is now up to 4-5x faster when loaded frequently!
Loading relative or absolute paths now has the same performance profile. The native startup will likely not be impacted.
I introduced a new cache which sits on top of the former main cache and caches the filename of already loaded modules. This is then used to check for lazy require calls and for identical require calls from different files in the same directory.
Refs: #25362
Benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/311/
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes