Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requiring a dynamically added dependency fails when the package has no index.js file #44663

Open
zkochan opened this issue Sep 15, 2022 · 39 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.

Comments

@zkochan
Copy link

zkochan commented Sep 15, 2022

Version

v7.1.0 and newer (last working version was v6.17.1)

Platform

Linux

Subsystem

No response

What steps will reproduce the bug?

Require a dependency that doesn't exist. Catch the error. Dynamically add the missing package to the node_modules. It should not contain an index.js file but should use a custom entry point instead. Require the now not missing package again.

Expected result: the modules is loaded.
Actual result: module not found error is thrown.

I have create a repro repo: https://github.com/zkochan/require-issue

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior?

No response

What do you see instead?

On the second require the module is loaded

Additional information

cc @GiladShoham @davidfirst

@DavidWeiss2
Copy link

I found a working workaround:
github fork

@zkochan
Copy link
Author

zkochan commented Sep 15, 2022

@DavidWeiss2 your workaround doesn't seem to work on my PC

image

@DavidWeiss2
Copy link

Can you please increase the timeout time? In my opinion the bug root cause is an i/o opertion of the write. And it worth a shot

@zkochan
Copy link
Author

zkochan commented Sep 15, 2022

Even an increase to 10 seconds doesn't help.

@DavidWeiss2
Copy link

@zkochan I uploaded another workaround at the same fork.
github fork

image

@DavidWeiss2
Copy link

Just published an npm package that solves this issue:
requirsafe

@zkochan
Copy link
Author

zkochan commented Sep 16, 2022

This workaround will work only on direct dependencies, not on subdependencies. I guess we could hook into require to fix it for subdeps.

@zkochan
Copy link
Author

zkochan commented Sep 16, 2022

Alright, this also works:

try {
  console.log(require("pkg"));
} catch (err) {
  console.error("failed require from package json");
  const data = require('pkg/package.json')
  console.log("this works", require('pkg/' + data.main));
}

We can use it to patch require. @DavidWeiss2 thanks for the workaround!

But of course it still needs to be fixed in Node.js.

@aduh95
Copy link
Contributor

aduh95 commented Sep 16, 2022

Why do you use node:child_process instead of node:fs for file system operations? If you switch to using node:fs, it looks like everything works as expected:

mkdir repro && cd repro
echo 'try { console.log(require("pkg")) } catch { console.log("first try failed") }' > entry.js
echo 'const fs = require("node:fs");' >> entry.js
echo 'fs.mkdirSync("node_modules/pkg/lib", { recursive:true });' >> entry.js 
echo 'fs.writeFileSync("node_modules/pkg/lib/main.js", "module.exports = `lib/main was loaded`");' >> entry.js
echo 'fs.writeFileSync("node_modules/pkg/package.json", JSON.stringify({main:"lib/main.js"}));' >> entry.js
echo 'try { console.log(require("pkg")) } catch { console.log("second try failed") }' >> entry.js
node entry.js
cd .. && rm -r repro

If I paste the above in my terminal, it outputs:

first try failed
lib/main was loaded

I'm going to close this as it doesn't seem to be a bug in Node.js, don't hesitate to re-open if you think I missed something or continue the discussion if you have more questions.

@aduh95 aduh95 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2022
@zkochan
Copy link
Author

zkochan commented Sep 16, 2022

it doesn't matter whether I use child process or use node:fs. It doesn't work. I have updated the repo to use fs. It fails.

don't hesitate to re-open if you think I missed something or continue the discussion if you have more questions.

I can't reopen it.

Your code also doesn't work if I move require two line below:

const fs = require("node:fs");
fs.mkdirSync("node_modules/pkg/lib", { recursive:true });
try { console.log(require("pkg")) } catch { console.log("first try failed") }
fs.writeFileSync("node_modules/pkg/lib/main.js", "module.exports = `lib/main was loaded`");
fs.writeFileSync("node_modules/pkg/package.json", JSON.stringify({main:"lib/main.js"}));
try { console.log(require("pkg")) } catch { console.log("second try failed") }

@aduh95
Copy link
Contributor

aduh95 commented Sep 17, 2022

Ah right, the issue is that Node.js doesn't re-check the FS if there was no package.json found. If you write the package.json before the first require, it doesn't reproduce:

mkdir repro && cd repro
echo 'const fs = require("node:fs");' >> entry.js
echo 'fs.mkdirSync("node_modules/pkg/lib", { recursive:true });' >> entry.js
echo 'fs.writeFileSync("node_modules/pkg/package.json", JSON.stringify({main:"lib/main.js"}));' >> entry.js
echo 'try { console.log(require("pkg")) } catch { console.log("first try failed") }' >> entry.js
echo 'fs.writeFileSync("node_modules/pkg/lib/main.js", "module.exports = `lib/main was loaded`");' >> entry.js
echo 'try { console.log(require("pkg")) } catch { console.log("second try failed") }' >> entry.js
node entry.js
cd .. && rm -r repro

I'm not sure it's worth fixing, having to re-request the FS and re-parse the package.json for each require call would be very inefficient. /cc @nodejs/modules

@aduh95 aduh95 reopened this Sep 17, 2022
@ljharb
Copy link
Member

ljharb commented Sep 17, 2022

I agree - has the use case of dynamically creating requireable files been one node has ever intentionally supported?

@davidfirst
Copy link

@aduh95 , if it's not worth fixing, it would be great to have the option to "clear" this kind-of cache of the non-found package.json files. (similar to the ability to clear require.cache).

@zkochan
Copy link
Author

zkochan commented Sep 17, 2022

It works inconsistently now. If I dynamically create a new package with an index.js file then it works. We have spent days to find this issue due to the inconsistency.

This definitely looks like worth fixing to me. It should be cheap to fix as you only need to reset cache and retry on a MODULE_NOT_FOUND error. For our use case what David suggests is enough but the inconsistency would remain. So either make it not work in all cases (which is a breaking change) or make it work in all cases.

@aduh95
Copy link
Contributor

aduh95 commented Sep 17, 2022

So either make it not work in all cases (which is a breaking change)

That would certainly be a reasonable option (and the one I would prefer personally), but given it's been this way for 13 years old, it's just not worth to break at this point IMHO. FWIW this behavior is already deprecated for ESM (see DEP0151), and CJS just has to live with it at this point.

or make it work in all cases

Or we could state that dynamically modifying files in node_modules at runtime is unsupported and doing so result in undefined behavior.

@zkochan
Copy link
Author

zkochan commented Sep 17, 2022

I am not sure why you are calling it "dynamically modified files". No files are modified. The files don't exist at the time of the first require. Hence, there is no reason to cache anything. Then the package is added to node_modules and it is there when require is executed again. So, it is not like there was a package.json with no main field on the first require and someone modified the package.json and added the main field.

@giltayar
Copy link
Contributor

@aduh95 it's not thirteen years old. IIUC, the specified cache is packageJsonCache (in the CJS loader) and it was added as an optimization when Node.js started reading the package.json-s to determine whether to load a JS file as ESM ("type": "module") or CJS ("type": "commonjs").

So my guess is 2-3 years old. And my guess is that the code does work in Node v10, which is the earliest Node.js that does not have full ESM functionality backported to it.

@giltayar
Copy link
Contributor

Unfortunately, removing this optimization would probably have significant performance disadvantages.

@giltayar
Copy link
Contributor

@zkochan do you care about the same scenario, but in ESM? There the cache is entirely hidden and by design, so only a loader would help here.

@benjamingr
Copy link
Member

When I opened a PR to not keep the stat cache on failure it was rejected in the past #31921

@benjamingr
Copy link
Member

see #31803 for prior discussion

@zkochan
Copy link
Author

zkochan commented Sep 17, 2022

So my guess is 2-3 years old. And my guess is that the code does work in Node v10, which is the earliest Node.js that does not have full ESM functionality backported to it.

This stopped working in Node.js v7.1.0. The last working version is Node.js v6.17.1

@zkochan do you care about the same scenario, but in ESM? There the cache is entirely hidden and by design, so only a loader would help here.

Yes. In bit we have extensions that are installed dynamically. We are in big trouble if this won't work. If this won't be fixed then we either need a way to clear the cache or to hook into require and add our own fix. For now, I was able to add a workaround by hooking into the require function using the fix suggested by @DavidWeiss2.

You do support dynamic imports. Even with esm you support await import(), so it feels counter intuitive that this wouldn't work.

@ljharb
Copy link
Member

ljharb commented Sep 17, 2022

Dynamic requires/imports are for a non-static specifier imo, not for a non-static module map.

@mcollina
Copy link
Member

This stopped working in Node.js v7.1.0. The last working version is Node.js v6.17.1

This is unfortunately ancient and hardly a regression at this point.

I guess the use case you want to support was never "supported" by Node.js - there were no tests to cover it. It just "worked" because Node.js is exceptionally malleable and folks could do all sorts of things with it. Unfortunately, ESM removed some of that flexibility.

What you want to achieve could be easily implemented for ESM dependencies using loader hooks.
What you miss is an implementation for this for CJS, would you like to contribute it? cc @GeoffreyBooth

@matthewp
Copy link

@mcollina I'd love to submit a fix for this, with guidance. Is a new API needed to clear this cache or can we hook on require.cache some how?

@mcollina
Copy link
Member

mcollina commented Nov 1, 2022

@matthewp I'm not that familiar with the commonjs loader to guide you, but I'll be very happy to review any intermediate step before marking the PR ready. I'm also happy to jump on a call.

@matthewp
Copy link

matthewp commented Nov 1, 2022

There's several caches in the CJS loader, how do people feel about a require.clearCache() (bikeshed) or similar method to clear them all?

@jkrems
Copy link
Contributor

jkrems commented Nov 1, 2022

There's several caches in the CJS loader, how do people feel about a require.clearCache() (bikeshed) or similar method to clear them all?

My only opinion on that front would be "rather import {clearRequireCache} from 'nodejs:module';", hanging non-local things on require is always a bit confusing. EDIT: Overall, a function to reset the cache SGTM. But I'll say that it may become confusing with ESM if people expect it to also work there.

@benjamingr
Copy link
Member

@matthewp Feel free to take my code from #31921 and rebase it/copy it over the changed loader code. Happy to answer any questions about it.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Nov 1, 2022

I have hesitations about this. We can’t and possibly won’t ever be able to clear the ESM cache, because the modules are loaded into V8 and V8 doesn’t provide an ability to unload them as far as I know. The CommonJS and ESM caches overlap somewhat, if I remember correctly; I think this “package.json cache” is something that’s shared between the two loaders. It feels like it would be problematic to be able to clear the package.json metadata from our cache for a module that’s in V8 and can’t be removed. At the very least it would introduce extra complexity to both loaders, potentially limiting the performance optimizations that could be achieved, and the use case doesn’t feel very compelling to me to accept those future limitations.

Another way to solve the original problem is to create a tool that analyzes the Node application to be run, tracing all the require and import statements the way bundlers do, assembling a list of all such calls; and try to dynamically require or import them, assembling a list of any calls that fail. Then run npm install or yarn install or whatever to install all the missing dependencies. There probably already exists a tool to find all the dependencies of a Node app, since I know there are tools that analyze apps to tell you of any dependencies that are missing from your package.json; such a library could be extended upon to achieve what you’re looking for here. Anyway once this command-line tool is built, simply run it before running the Node app, and you will have achieved the same result.

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Nov 1, 2022
@mcollina
Copy link
Member

mcollina commented Nov 2, 2022

This feature is going to be commonjs only. I don't think we will ever be able to support it in ESM for the limitations you have described.

Specifically, the cache to clear is - which is very limited to commonjs:

let statCache = null;

Modules that need this fix are downloaded millions of times per week, e.g. import-fresh.

@matthewp
Copy link

matthewp commented Nov 2, 2022

Yes, this is specifically about the CJS cache. Even more specifically the packageJsonCache variable added here 4396beb

Actually I think this bug can be fixed without a new API to clear the cache. @giltayar said:

Unfortunately, removing this optimization would probably have significant performance disadvantages.

We don't need to remove this optimization, we only need to remove it from caching false-reads. Specifically we just need to remove this line:

packageJsonCache.set(jsonPath, false);

In this case when a package.json is found it continues to be used from the cache. Only if one is not found would it do another fs read. This seems fine from a performance perspective.

The commit where this was added 4396beb was not done for performance reasons, so I'm thinking this is an incidental change.

@aduh95
Copy link
Contributor

aduh95 commented Nov 2, 2022

Yes, this is specifically about the CJS cache. Even more specifically the packageJsonCache

The ESM loader also uses the same packageJsonCache, so clearing it may or may not end up violating the spec in some cases, it's worth exploring but we should remain cautious of unforeseen consequences.

In this case when a package.json is found it continues to be used from the cache. Only if one is not found would it do another fs read. This seems fine from a performance perspective.

I don't share your optimism, it would force a FS lookup for every .js file loaded in a package that doesn't have a package.json, I wouldn't be surprised if it had a very noticeable delay for such packages.

@GeoffreyBooth
Copy link
Member

Stepping back for a minute from implementation details, I’d like some consideration given to why this needs to be part of core. In #44663 (comment) I theorized a potential way to implement this “install missing dependencies automatically” feature request via an external script that could be run before the app. Is there some reason why this functionality can’t be achieved without changes to core? Yes the external script would require restarting the Node process instead of just clearing the cache, and presumably clearing the cache would be slightly faster, but anyone who’s trying to install packages on demand presumably isn’t also trying to maximize performance. The external script approach would also have the benefit that it would work for both CommonJS and ESM.

@matthewp
Copy link

matthewp commented Nov 2, 2022

@GeoffreyBooth Aside from that being a very heavy workaround, it doesn't fix the use-case which is dynamic requires. Bundlers do not support truly dynamic requires either.


@aduh95

The ESM loader also uses the same packageJsonCache, so clearing it may or may not end up violating the spec in some cases, it's worth exploring but we should remain cautious of unforeseen consequences.

That cache is not exported from this module. Do you mean that it's used indirectly because it somehow uses the CJS loader internally? I'll take your word for it on that then, we can do an explicit clearRequireCache.

I don't share your optimism, it would force a FS lookup for every .js file loaded in a package that doesn't have a package.json, I wouldn't be surprised if it had a very noticeable delay for such packages.

Given that it was added in a PR not related to performance I think this is an edge-case that people have lived with for a long time.

@GeoffreyBooth
Copy link
Member

it doesn’t fix the use-case which is dynamic requires.

I view that as an implementation detail, to achieve the broader goal of automatic installation of missing dependencies. My point is that you can achieve this end goal in other ways.

@aduh95
Copy link
Contributor

aduh95 commented Nov 2, 2022

That cache is not exported from this module.

It is exposed through getPackageScopeConfig, which relies on packageJsonCache.

Given that it was added in a PR not related to performance I think this is an edge-case that people have lived with for a long time.

Before we introduced "type": "module", package.json was parsed only to check for a "main" file, which is used only when trying to require a repository. So there was no reason to lookup the package.json for each .js file, which we must now do.

@matthewp
Copy link

matthewp commented Nov 3, 2022

@GeoffreyBooth There's always a workaround, no doubt. My use-case is not automatic installation of missing dependencies, however. My use-case is allowing a user to install a dependency after they have started a Node.js server without having to kill the process and start it again.

@hinell
Copy link

hinell commented Jul 2, 2023

The issue can be boiled down into the node.js failing to update its cache of node_nodules when they are added during runtime and immediately requested. Can't image use cases for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests