Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: detect ESM syntax by trying to recompile as SourceTextModule #52413

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

joyeecheung
Copy link
Member

Instead of using an async function wrapper, just try compiling code with unknown module format as SourceTextModule when it cannot be compiled as CJS and the error message indicates that it's worth a retry. If it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext out as static methods and move GetHostDefinedOptions to ModuleWrap.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/vm

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 7, 2024
@guybedford
Copy link
Contributor

This seems like it works. Can we think of any cases where a module would not use top-level await, import or export or import.meta syntax, and does not lexically define any of the CJS variables, where it would parse as ESM and not a script function body? I can't think of anything but it would be great if we could fully confirm the edge cases.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 7, 2024

I think no matter what edge cases there are, it's pretty unlikely that reparsing it with a weird async function wrapper (what the main branch currently does) can be any more correct than reparsing as SourceTextModule....I'd prefer to just go ahead with this to unblock #52047

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 7, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 7, 2024
@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth
Copy link
Member

There are two reasons for the current behavior:

  1. If we reparse as ESM and that fails too, presumably the initial parse failed for a reason other than ESM syntax and so we want to throw that error, not whatever error is thrown by attempting to parse as ESM.
  2. There's no need to reparse as ESM for any CommonJS parse error, so for efficiency we want to do the reparse only when it might make a difference.

I'm on my phone so I can't tell if these have been addressed, but if so, then sure let's change the algorithm (and update the docs accordingly). I think the first point has an existing test covering it, but if not then we should add one.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 8, 2024

If we reparse as ESM and that fails too, presumably the initial parse failed for a reason other than ESM syntax and so we want to throw that error, not whatever error is thrown by attempting to parse as ESM.

That's not changed. The error getting rethrown from catch block in executeUserEntryPoint() is still the error from the CJS loader. Reparse errors are still swallowed, the only difference is that previously it was re-parsed as wrapped async function, now it's reparsed as ESM. That's covered by test/fixtures/es-modules/package-without-type/commonjs-wrapper-variables.js which still passes as you can see in the CI.

There's no need to reparse as ESM for any CommonJS parse error, so for efficiency we want to do the reparse only when it might make a difference.

That's not changed either. It is still only reparsed when the error is in the throws_only_in_cjs_error_messages set.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 8, 2024

then sure let's change the algorithm (and update the docs accordingly).

I don't think the algorithm as documented changes in anyway as it does not mention specifics about how the code is reparsed (would be somewhat funny to mention that async wrapper too). The doc only lists errors used to determine whether the code should be retried as ESM, those aren't changed by this PR. We are still using the same set to determine syntax errors that definitely indicate ESM syntax and don't need a reparse, and the same set to determine syntax errors that warrant a reparse to double-check.

lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
@joyeecheung joyeecheung force-pushed the refactor-esm-detection branch from 179dabd to dc9f175 Compare April 11, 2024 12:35
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Comment on lines -181 to -182
const { enrichCJSError } = require('internal/modules/esm/translators');
enrichCJSError(error, source, resolvedMain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is removing these lines not causing any tests to break? There are tests that check the output of this “enriched” error that I’d think should be failing without these lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those only check part of the sentence because the rest keeps changing (typically it's not a good idea to check all of the message, either). The error is now "enriched" in C++.

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was talking about my followup. In this branch, this part is already just unreachable, even before this patch, because if !retryAsESM, then the code doesn't contains ESM syntax, then the error won't be "enriched" by enrichCJSError() anyway (i.e. skipping inside enrichCJSError() already).

lib/internal/modules/run_main.js Show resolved Hide resolved
try_catch.ReThrow();
return;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this consolidation mean that we no longer need the try/catch at all? In other words, we could revert #52093 to go back to the previous approach of calling containsModuleSyntax, and then running the CJS entry with the CJS loader, and the performance cost will be negligible because the parse gets cached in V8?

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what my follow up does https://github.com/joyeecheung/node/commits/esm-detect-cpp/ which also implements detection support for require(esm): if it's entry point, and the detection flag is enabled, and it looks like ESM, use the path that allow TLA, and ignore the exports because uh no one needs the module.exports of the entry point..at least for something that's already ESM because that would not have been possible anyway? If it's not entry point, and the require flag is enabled, or detection flag is enabled and it looks like ESM, do synchronous require esm and add module.exports. Though I don't think it's a good idea to merge that into this PR, this is a refactoring, that one implement things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it’s entry point, and the detection flag is enabled, and it looks like ESM, use the path that allow TLA, and ignore the exports because uh no one needs the module.exports of the entry point

What if the entry point gets required or imported a second time? Like node entry.js imports foo.js which imports entry.js? The second import would need the exports from entry.js.

Can we land your follow-up and unflag detect-module while require(esm) is still flagged? Or is require(esm) possibly close enough to being ready to unflag that we could just do them together?

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second import would need the exports from entry.js.

If entry.js is ESM, then that's the namespace, not module.exports, and we'll return the namespace of the cached module in the ESM loader, because import entry.js is not handled by CJS loader when the module job is already cached.

If it's require entry.js again, then we can error or warn that it's the entry point, or return the namespace if it's fully evaluated. But that's talking about require(esm) edge cases in a follow-up now, pretty off topic for this PR.

Copy link
Member Author

@joyeecheung joyeecheung Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we land your follow-up and unflag detect-module while require(esm) is still flagged?

I don't know for sure, locally my followup does pass test/parallel/test-debugger-exceptions.js even when detect-module is unflagged, though, if you think that's the last blocker for unflagging detection. require(esm) will need to be flagged for at least 22, blockers are __esModule interop and customization hooks support, and maybe there are more as we get more feedback from 22. In general I'm not too comfortable with just unflagging an experimental feature that really needs some battle testing from users at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not too comfortable with just unflagging an experimental feature that really needs some battle testing from users at least.

Especially for a feature like this one. As soon as it can be used without a flag, people will start to publish packages that depend on it. Then we will be blamed if we break them.

src/node_contextify.cc Outdated Show resolved Hide resolved
@joyeecheung joyeecheung force-pushed the refactor-esm-detection branch from 1b571e1 to 6eec04e Compare April 12, 2024 15:57
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 12, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 12, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

// In case the entry point is a large file, such as a bundle,
// ensure no further references can prevent it being garbage-collected.
cjsLoader.entryPointSource = undefined;
if (error != null && ObjectGetPrototypeOf(error) === SyntaxErrorPrototype) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential follow-up: Do we control the error being thrown? If so, I'm wondering if we should create a sub-class like MismatchedSyntaxError 🤔

Copy link
Member Author

@joyeecheung joyeecheung Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow up is just don't do the check in JS, and check right after parsing in C++ :)

lib/internal/modules/run_main.js Show resolved Hide resolved
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 17, 2024
@nodejs-github-bot
Copy link
Collaborator

Instead of using an async function wrapper, just try compiling code with
unknown module format as SourceTextModule when it cannot be compiled
as CJS and the error message indicates that it's worth a retry. If
it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that
we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext
out as static methods and move GetHostDefinedOptions to
ModuleWrap.
@joyeecheung joyeecheung force-pushed the refactor-esm-detection branch from 425d8ee to 14afc5a Compare April 18, 2024 18:36
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 19, 2024
@joyeecheung joyeecheung added the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 19, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Apr 19, 2024
@nodejs-github-bot nodejs-github-bot merged commit 651fa04 into nodejs:main Apr 19, 2024
41 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 651fa04

aduh95 pushed a commit that referenced this pull request Apr 29, 2024
Instead of using an async function wrapper, just try compiling code with
unknown module format as SourceTextModule when it cannot be compiled
as CJS and the error message indicates that it's worth a retry. If
it can be parsed as SourceTextModule then it's considered ESM.

Also, move shouldRetryAsESM() to C++ completely so that
we can reuse it in the CJS module loader for require(esm).

Drive-by: move methods that don't belong to ContextifyContext
out as static methods and move GetHostDefinedOptions to
ModuleWrap.

PR-URL: #52413
Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
Reviewed-By: Jacob Smith <jacob@frende.me>
@marco-ippolito marco-ippolito added the backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. label Jul 19, 2024
@targos targos added dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. and removed backport-blocked-v20.x PRs that should land on the v20.x-staging branch but are blocked by another PR's pending backport. labels Sep 21, 2024
jkleinsc pushed a commit to electron/electron that referenced this pull request Nov 4, 2024
* chore: bump Node.js to v22.9.0

* build: drop base64 dep in GN build

nodejs/node#52856

* build,tools: make addons tests work with GN

nodejs/node#50737

* fs: add fast api for InternalModuleStat

nodejs/node#51344

* src: move package_json_reader cache to c++

nodejs/node#50322

* crypto: disable PKCS#1 padding for privateDecrypt

nodejs-private/node-private#525

* src: move more crypto code to ncrypto

nodejs/node#54320

* crypto: ensure valid point on elliptic curve in SubtleCrypto.importKey

nodejs/node#50234

* src: shift more crypto impl details to ncrypto

nodejs/node#54028

* src: switch crypto APIs to use Maybe<void>

nodejs/node#54775

* crypto: remove DEFAULT_ENCODING

nodejs/node#47182

* deps: update libuv to 1.47.0

nodejs/node#50650

* build: fix conflict gyp configs

nodejs/node#53605

* lib,src: drop --experimental-network-imports

nodejs/node#53822

* esm: align sync and async load implementations

nodejs/node#49152

* esm: remove unnecessary toNamespacedPath calls

nodejs/node#53656

* module: detect ESM syntax by trying to recompile as SourceTextModule

nodejs/node#52413

* test: adapt debugger tests to V8 11.4

nodejs/node#49639

* lib: update usage of always on Atomics API

nodejs/node#49639

* test: adapt test-fs-write to V8 internal changes

nodejs/node#49639

* test: adapt to new V8 trusted memory spaces

nodejs/node#50115

* deps: update libuv to 1.47.0

nodejs/node#50650

* src: use non-deprecated v8::Uint8Array::kMaxLength

nodejs/node#50115

* src: update default V8 platform to override functions with location

nodejs/node#51362

* src: add missing TryCatch

nodejs/node#51362

* lib,test: handle new Iterator global

nodejs/node#51362

* src: use non-deprecated version of CreateSyntheticModule

nodejs/node#50115

* src: remove calls to recently deprecated V8 APIs

nodejs/node#52996

* src: use new V8 API to define stream accessor

nodejs/node#53084

* src: do not use deprecated V8 API

nodejs/node#53084

* src: do not use soon-to-be-deprecated V8 API

nodejs/node#53174

* src: migrate to new V8 interceptors API

nodejs/node#52745

* src: use supported API to get stalled TLA messages

nodejs/node#51362

* module: print location of unsettled top-level await in entry points

nodejs/node#51999

* test: make snapshot comparison more flexible

nodejs/node#54375

* test: do not set concurrency on parallelized runs

nodejs/node#52177

* src: move FromNamespacedPath to path.cc

nodejs/node#53540

* test: adapt to new V8 trusted memory spaces

nodejs/node#50115

* build: add option to enable clang-cl on Windows

nodejs/node#52870

* chore: fixup patch indices

* chore: add/remove changed files

* esm: drop support for import assertions

nodejs/node#54890

* build: compile with C++20 support

nodejs/node#52838

* deps: update nghttp2 to 1.62.1

nodejs/node#52966

* src: parse inspector profiles with simdjson

nodejs/node#51783

* build: add GN build files

nodejs/node#47637

* deps,lib,src: add experimental web storage

nodejs/node#52435

* build: add missing BoringSSL dep

* src: rewrite task runner in c++

nodejs/node#52609

* fixup! build: add GN build files

* src: stop using deprecated fields of v8::FastApiCallbackOptions

nodejs/node#54077

* fix: shadow variable

* build: add back incorrectly removed SetAccessor patch

* fixup! fixup! build: add GN build files

* crypto: fix integer comparison in crypto for BoringSSL

* src,lib: reducing C++ calls of esm legacy main resolve

nodejs/node#48325

* src: move more crypto_dh.cc code to ncrypto

nodejs/node#54459

* chore: fixup GN files for previous commit

* src: move more crypto code to ncrypto

nodejs/node#54320

* Fixup Perfetto ifdef guards

* fix: missing electron_natives dep

* fix: node_use_node_platform = false

* fix: include src/node_snapshot_stub.cc in libnode

* 5507047: [import-attributes] Remove support for import assertions

https://chromium-review.googlesource.com/c/v8/v8/+/5507047

* fix: restore v8-sandbox.h in filenames.json

* fix: re-add original-fs generation logic

* fix: ngtcp2 openssl dep

* test: try removing NAPI_VERSION undef

* chore(deps): bump @types/node

* src: move more crypto_dh.cc code to ncrypto

nodejs/node#54459

* esm: remove unnecessary toNamespacedPath calls

nodejs/node#53656

* buffer: fix out of range for toString

nodejs/node#54553

* lib: rewrite AsyncLocalStorage without async_hooks

nodejs/node#48528

* module: print amount of load time of a cjs module

nodejs/node#52213

* test: skip reproducible snapshot test on 32-bit

nodejs/node#53592

* fixup! src: move more crypto_dh.cc code to ncrypto

* test: adjust emittedUntil return type

* chore: remove redundant wpt streams patch

* fixup! chore(deps): bump @types/node

* fix: gn executable name on Windows

* fix: build on Windows

* fix: rename conflicting win32 symbols in //third_party/sqlite

On Windows otherwise we get:

lld-link: error: duplicate symbol: sqlite3_win32_write_debug
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:47987
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_sleep
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48042
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_is_nt
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48113
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_utf8_to_unicode
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48470
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_unicode_to_utf8
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48486
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_mbcs_to_utf8
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48502
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_mbcs_to_utf8_v2
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48518
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_utf8_to_mbcs
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48534
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

lld-link: error: duplicate symbol: sqlite3_win32_utf8_to_mbcs_v2
>>> defined at .\..\..\third_party\electron_node\deps\sqlite\sqlite3.c:48550
>>>            obj/third_party/electron_node/deps/sqlite/sqlite/sqlite3.obj
>>> defined at obj/third_party/sqlite\chromium_sqlite3/sqlite3_shim.obj

* docs: remove unnecessary ts-expect-error after types bump

* src: move package resolver to c++

nodejs/node#50322

* build: set ASAN detect_container_overflow=0

nodejs/node#55584

* chore: fixup rebase

* test: disable failing ASAN test

* win: almost fix race detecting ESRCH in uv_kill

libuv/libuv#4341
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants