-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APIs for libraries/frameworks/tools to control on-disk compilation cache (NODE_COMPILE_CACHE) #53639
Comments
cc @merceyz @jakebailey @H4ad from #47472 |
Why? What sort of flexibility would the library/framework need that the environment variable doesn't provide? |
This sounds great; supporting a default location in a reasonable location is super helpful. Is
All-in-all, I'm not sure how I feel about being unable to use this without using CJS or TLA; if an executable wants to enable caching of itself, it needs to have an extra entrypoint which only serves to enable the caching and then load the other code. Or, fork, which is slow. Not sure that one can do better, though. The call has to happen somewhere... I guess this is exactly how v8-compile-cache works? (Not familar with its implementation but I guess it must have the same restriction...)
If you want to enable caching today, you have to set the environment variable. This means that applications which want to enable it for themselves have to fork a new process to enable it, defeating the speedup. |
I used |
You could also
Yeah I think there is a general lack of way for libraries to "define something to be run before everything else, without the use of command line flags, or environment variables". It was also raised in the module loader hooks discussion (#52219 (comment)). IMO we need to figure out a way to allow developers/users to specify code that needs to be preloaded for every/some process/worker. But some configuration needs to happen - perhaps some magic field in package.json is a good place for it to be done, but that would probably be a separate topic. |
I have a WIP at https://github.com/joyeecheung/node/tree/compile-cache-api - still needs to finish the tests and docs. Locally with this wrapper const { enableCompileCache } = require('module');
if (enableCompileCache) {
enableCompileCache();
}
require('./test/fixtures/snapshot/typescript.js'); I get the following numbers:
I think for the use case of TypeScript, a trampoline entrypoint like this is still needed to 1. enable code cache and 2. load the actual lib. If the lib part is ESM, this trampoline entrypoint must either be CJS that does EDIT: actually you can do it in a ESM trampoline, just that the lib itself still cannot be imported statically, but you can have import { createRequire, enableCompileCache } from 'node:module'; // Or use process.getBuiltinModule()
const require = createRequire(import.meta.url); // You can call this importSync if you want ;)
if (enableCompileCache) {
enableCompileCache();
}
require('./test/fixtures/snapshot/typescript.js'); |
Something that I think we should add along with the API - an environment variable to disable caching, as an escape hatch for users running into bugs (it has helped some people using v8-compile-cache in #51555 - while the built-in cache would be a bit more robust, an escape hatch would still be useful in case there are bugs). |
This refactors the compile cache handler in preparation for the JS API, and updates the compile cache storage structure into: - $NODE_COMPILE_CACHE_DIR - $NODE_VERION-$ARCH-$CACHE_DATA_VERSION_TAG-$UID - $FILENAME_AND_MODULE_TYPE_HASH.cache This also adds a magic number to the beginning of the cache files for verification, and returns the status, compile cache directory and/or error message of enabling the compile cache in a structure, which can be converted as JS counterparts by the upcoming JS API. PR-URL: #54291 Refs: #53639 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ethan Arrowood <ethan@arrowood.dev> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
This refactors the compile cache handler in preparation for the JS API, and updates the compile cache storage structure into: - $NODE_COMPILE_CACHE_DIR - $NODE_VERION-$ARCH-$CACHE_DATA_VERSION_TAG-$UID - $FILENAME_AND_MODULE_TYPE_HASH.cache This also adds a magic number to the beginning of the cache files for verification, and returns the status, compile cache directory and/or error message of enabling the compile cache in a structure, which can be converted as JS counterparts by the upcoming JS API. PR-URL: #54291 Refs: #53639 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ethan Arrowood <ethan@arrowood.dev> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
This refactors the compile cache handler in preparation for the JS API, and updates the compile cache storage structure into: - $NODE_COMPILE_CACHE_DIR - $NODE_VERION-$ARCH-$CACHE_DATA_VERSION_TAG-$UID - $FILENAME_AND_MODULE_TYPE_HASH.cache This also adds a magic number to the beginning of the cache files for verification, and returns the status, compile cache directory and/or error message of enabling the compile cache in a structure, which can be converted as JS counterparts by the upcoming JS API. PR-URL: #54291 Refs: #53639 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Ethan Arrowood <ethan@arrowood.dev> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Opened #54501 |
This patch adds the following API for tools to enable compile cache dynamically and query its status. - module.enableCompileCache(cacheDir) - module.getCompileCacheDir() In addition this adds a NODE_DISABLE_COMPILE_CACHE environment variable to disable the code cache enabled by the APIs as an escape hatch to avoid unexpected/undesired effects of the compile cache (e.g. less precise test coverage). When the module.enableCompileCache() method is invoked without a specified directory, Node.js will use the value of the NODE_COMPILE_CACHE environment variable if it's set, or defaults to `path.join(os.tmpdir(), 'node-compile-cache')` otherwise. Therefore it's recommended for tools to call this method without specifying the directory to allow overrides. PR-URL: #54501 Fixes: #53639 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
This patch adds the following API for tools to enable compile cache dynamically and query its status. - module.enableCompileCache(cacheDir) - module.getCompileCacheDir() In addition this adds a NODE_DISABLE_COMPILE_CACHE environment variable to disable the code cache enabled by the APIs as an escape hatch to avoid unexpected/undesired effects of the compile cache (e.g. less precise test coverage). When the module.enableCompileCache() method is invoked without a specified directory, Node.js will use the value of the NODE_COMPILE_CACHE environment variable if it's set, or defaults to `path.join(os.tmpdir(), 'node-compile-cache')` otherwise. Therefore it's recommended for tools to call this method without specifying the directory to allow overrides. PR-URL: #54501 Fixes: #53639 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
This patch adds the following API for tools to enable compile cache dynamically and query its status. - module.enableCompileCache(cacheDir) - module.getCompileCacheDir() In addition this adds a NODE_DISABLE_COMPILE_CACHE environment variable to disable the code cache enabled by the APIs as an escape hatch to avoid unexpected/undesired effects of the compile cache (e.g. less precise test coverage). When the module.enableCompileCache() method is invoked without a specified directory, Node.js will use the value of the NODE_COMPILE_CACHE environment variable if it's set, or defaults to `path.join(os.tmpdir(), 'node-compile-cache')` otherwise. Therefore it's recommended for tools to call this method without specifying the directory to allow overrides. PR-URL: #54501 Fixes: #53639 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
This patch adds the following API for tools to enable compile cache dynamically and query its status. - module.enableCompileCache(cacheDir) - module.getCompileCacheDir() In addition this adds a NODE_DISABLE_COMPILE_CACHE environment variable to disable the code cache enabled by the APIs as an escape hatch to avoid unexpected/undesired effects of the compile cache (e.g. less precise test coverage). When the module.enableCompileCache() method is invoked without a specified directory, Node.js will use the value of the NODE_COMPILE_CACHE environment variable if it's set, or defaults to `path.join(os.tmpdir(), 'node-compile-cache')` otherwise. Therefore it's recommended for tools to call this method without specifying the directory to allow overrides. PR-URL: #54501 Fixes: #53639 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
This patch adds the following API for tools to enable compile cache dynamically and query its status. - module.enableCompileCache(cacheDir) - module.getCompileCacheDir() In addition this adds a NODE_DISABLE_COMPILE_CACHE environment variable to disable the code cache enabled by the APIs as an escape hatch to avoid unexpected/undesired effects of the compile cache (e.g. less precise test coverage). When the module.enableCompileCache() method is invoked without a specified directory, Node.js will use the value of the NODE_COMPILE_CACHE environment variable if it's set, or defaults to `path.join(os.tmpdir(), 'node-compile-cache')` otherwise. Therefore it's recommended for tools to call this method without specifying the directory to allow overrides. PR-URL: nodejs#54501 Fixes: nodejs#53639 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Spinning from #52535 (comment)
Currently, the built-in on-disk compilation cache can only be enabled by
NODE_COMPILE_CACHE
. It's possible for the end user to control where theNODE_COMPILE_CACHE
is stored and so that it's also possible for them to find the cache and clean it up when necessary. That's the simplest enabling mechanism for sure, but from the use cases of v8-compile-cache (a package that monkey-patches the CJS loader, which is a capability that we want to sunset, see #47472). It's also common for library/framework authors to want to enable this in a more flexible manner. So this issue is opened to discuss what an API for this should look like and what the directory structure of the cache should look like.With the global
NODE_COMPILE_CACHE
the current cache directory structure looks like this:For reference
v8-compile-cache
's cache directory looks like thisAnd inside the .BLOB files it maintains a
module_filename + sha-1 checksum -> cache_data
storage. In the documentation it explains:In my investigation when implementing NODE_MODULE_CACHE though, there's actually not much performance difference in reading on a file-by-file basis, at least when it's implemented using native FS calls and when the file only gets loaded when the corresponding module is about to get compiled (so not all the cache is loaded into the process at once even though the module might not be needed by the application at all - which
v8-compile-cache
does).For third-party tooling (e.g. transpilers, package managers) I think the layout that don't distinguish about entrypoints would still be beneficial - as long as the final resolved file path remains the same and its content matches the checksum, and it's still being loaded by the same Node.js version etc., then the cache is going to hit. Then if multiple dependencies in the same project try to enable it, we wouldn't be saving multiple caches on disk even though they are effectively caching the code for the same files (e.g. the end user code needs package
foo
that resolves to/path/to/foo.js
, whose cache gets repeatedly stored in the cache enabled by a transpiler and then again in the cache enabled a package manager that executes a run command).I wonder if we should just provide the following APIs:
process.getCompileCacheDir()
would still allow end users to find and clean stale cache to release disk space. We could probably also add a file to the designated directory with a name that's easy to find (e.g.$CACHE_DIR/node_compile_cache_mark
) to facilitate this too.In most use cases, tooling and libraries should simply call
module.enableCompileCache()
without passing in an argument so that the cache is stored in tmpdir and can be shared with other dependencies by default, and end users can override the default cache directory location withNODE_COMPILE_CACHE
. Some more advanced tooling/framework might want more advanced customizations and use their own cache directory, then they can specify it.Some more powerful APIs are probably needed to allow advanced configuration of the cache storage, but at least the APIs mentioned above would address the use cases of existing
v8-compile-cache
users. For the more power API, it would be difficult to just think of one that works well without some collaboration with adopters, so ideas welcomed regarding how that should look like :)The text was updated successfully, but these errors were encountered: