-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm: loader chaining #33812
esm: loader chaining #33812
Conversation
0e5b5ba
to
bff7c2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat opposed to this kind of chaining API. It encourages writing loaders that all reinvent their own filtering. Composition becomes a lot harder in that kind of ecosystem. Is there any kind of write-up for how this chaining would work in practice?
Should we mark this as WIP? There's a lot here that needs discussion. I also think that #31229 should probably land before this. I'm a fan of loader chaining; the current hook signatures were designed with chaining in mind. That's why I'm a bit confused that the signatures here are changed seemingly arbitrarily. The current signatures were settled upon after a lot of debate, so any changes to them would be best in their own PR. Can this PR please only have the changes that are necessary for chaining, and any further API redesign can be its own PR? In particular, many loaders need access to the |
I'm not a huge fan of chaining either but the modules team adding state to module resolution has really backed us into a corner. Loaders now have to make decisions about whether they are the right loader to handle a specific import based on "conditions" and self-resolution and whatnot. So when node wants to select a loader to handle an import, it has to explicitly ask the loader if it can handle it (call each hook until one gives an answer). I would prefer a more declarative api (maybe an extension registry that loaders can use) but that would require at least getting rid of type: module and I doubt that will ever happen. This PR isn't WIP, i intend to land it as it is (modulo general review). If someone is willing to land loader threads right now I will certainly wait, but otherwise I'd like to do it in this order. If there are certain behaviours of loaders that you wish for node to preserve I would suggest making in-tree tests for them so we know when we break them.
Indeed, that's something I am planning to deal with after this. I have enough concerns about that api that I don't want to wedge it into this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please separate out the hook API changes into a separate PR. That warrants its own discussion and is a separate matter than just introducing chaining.
@GeoffreyBooth for future reference, please provide concrete concerns when leaving a -1 so people can actually address them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is something that's meant to be landable short-term, I'll add an explicit change request. Making loader hooks effectively untestable and un-reusable as code (by relying on import.meta
injection) isn't an API trade-off I'm comfortable with.
@jkrems can you explain what "untestable" because of import.meta means? We have lots of tests that involve import.meta. |
I'm not sure I get where state makes writing loader hooks harder and how it relates to state. E.g. conditions isn't actual state. Self-resolution isn't using state. Maybe I'm just missing what you mean by "state" here? If you could identify where there are issues with the module system that makes writing good loader hooks hard, we could attempt to address them.
I'm not sure why it would require getting rid of |
@devsnek If I'm writing a loader hook, I can't import the hook and call the functions. If I want to reuse a hook function, I can't import it from another file and call it. The hooks module becomes a super special file, basically its own file format, that can only be loaded with a lot of setup. import {resolve} from './loader.mjs';
import assert from 'assert';
// Oops, can't write a unit test.
// I would have to write an entire module loader and/or spawn a new node process.
assert.equal(null, await resolve('./foo.mjs', 'file:///x/y.z')); |
@jkrems they kind of are special, since they will be hoisted into threads. I think directly importing one would be a bad idea. As for the import.meta testing situation, it sounds like you'd have to avoid import.meta ever having any context specific information for the rest of time? That doesn't really seem like a good direction to me. I'd be happy to help you find ways to improve your testing setup to not have that limitation, i'm always around on irc if you want to discuss it. |
There may be a misunderstanding. This isn't really about any particular test I wrote. It's more about the general idea of separating logic from wiring (e.g. https://en.wikipedia.org/wiki/Inversion_of_control). If you want more examples for why this design can be problematic, consider this: export {resolve} from './resolve.mjs';
export {transformSource} from './transformSource.mjs'; Relatively "normal" refactorings that worked fine with the previous APIs are now completely impossible. Let me know if you want more examples! :) |
I'm sorry, I thought I had done so. Here they are more concisely:
You're right that we should include tests for things like https://github.com/GeoffreyBooth/node-loaders/blob/master/coffeescript-loader/loader.js#L33-L43; the example loaders in our current tests are a little too trivial. That's not a blocker on this PR, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. I think it's very close to what we'd all like to see here for loader chaining and it would be amazing to get this merged soon - the sooner we can end churn on the loader APIs the better, and chaining is a huge part of that.
Just the one comment wrt import.meta.parentLoader
versus other ways of carrying this information through.
87663bf
to
26dd7d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Really like how all the fixtures are shrinking because of the new delegation sugar.
I'd still like to revisit the term "parent" for the next loader in the chain but that doesn't have to happen in this PR.
@GeoffreyBooth are you still against this landing before threads? This is pretty much ready to land. |
My two cents as one of the two people who have been contributing to the threads PR: I don't think there's anybody who has committed to getting it ready to land soon™ and it has been active only sporadicly. I don't think it's reasonable to block other PRs on it landing, especially since @devsnek may be the most likely person to actually put in the work to get threading in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devsnek Thanks for putting in the effort to make this work. I'm fine with letting this land before #31229, my concern there was that that seemed ready and I didn't want to push a lot of work on @bmeck to refactor that to account for these changes. But since that PR might need a fair bit more work, and chaining is more important to us anyway, I'm fine with landing this first if @bmeck is.
My other notes about the API changes still stand. I've only re-reviewed the docs so far, but there seem to be a some changes that don't feel like improvements. Please see my notes and once we come to agreement on the docs (and therefore the intended API) I'll review the code, which I'm sure will be fine.
Please explain how the chaining pattern in this PR differs from the previous one, and why it's better. I've read the paragraph in the docs, and I don't see an answer to this question. More broadly, what design patterns of how to chain loaders together were considered, and why is this particular pattern the best choice? Please also explain why different hooks might behave differently than others. What would be lost functionality-wise if we unified the pattern (i.e. if the UX got simpler, I assume there would be some tradeoff in functionality)? |
@GeoffreyBooth do you mean, why the transformSource hook was changed? |
What do you mean by the previous one here? Do you mean in comparison to the current implementation or @bmeck's approach? I'm not clear on the details of @bmeck's previous PR personally.
When we originally discussed the
Are you referring to the transform loader as in (3) above here?
Can you explain in more detail what you mean by unified the pattern here? |
@devsnek it sounds like that is the main concern here to me too. Perhaps you could explain this in more detail again? |
As compared to the current implementation. And yes, why don't all hooks follow the same rules regarding chaining. What is gained by one or more hooks having unique patterns, when what's lost is a simpler UX. |
I changed it because I was planning to change it in a future PR (as is mentioned in the OP) and someone didn't want to merge this with it using the nextHook model. transformSource doesn't use the nextHook model because it's inherently the wrong model for it. nextHook describes a sort of "searching for an answer" model where the first hook to respond is what is used as the result. transformSource isn't like that, every registered transformSource hook should be visited, because the entire point of transformSource is to be a transformation, and you can't do that with a "first response" model. |
This is the part I'm unsure about. Why not make all the hooks like Like for example imagine two theoretical |
@GeoffreyBooth that would be a new type of hook model since all hooks would then need an argument corresponding to the type of their output. Eg |
Yes, but that's what chaining is. It seems like this PR implements chaining for The main use cases we've been discussing as going unfulfilled by the current loaders implementation are stubs/mocks and instrumentation. It seems to me that a branching model wouldn't satisfy those needs. For example, an instrumentation loader that wants to log all resolved URLs would have no way of doing so; it could see what URLs Node would resolve, by calling I'm sorry that @devsnek went to a lot of effort implementing this PR without discussing the goals or API design first, it feels very late to discuss this now, but unfortunately no issue was opened ahead of time with a proposal for what this PR was intended to be. |
If I understand this correctly, and I had the following 5 loaders chained… node --expirimental-loader=./fetch-loader.mjs \
--experimental-loader=./repl-loader.mjs \
--experimental-loader=./typescript-loader.mjs \
--experimental-loader=./import-map-resolve.mjs \
--experimental-loader=./remote-resolve.mjs ./app.mjs FIRST the THEN the THEN the THEN the THEN the
|
I prefer to use the terminology here that there are multiple hook models, but all hook models support chaining in some form. In terms of defining terminology for hook models, perhaps we can call this an "explicit" call hook model (loader branching as you seem to call it) whereas the transformSource / Webpack / RollupJS hooks follow a sort of "implicit" call hook model.
Branching or explicit call models provide a little more delegation flexibility to implicit hook models because eg a loader at the end of the chain can (a) know the original specifier input (b) call the full chain to see what would be returned by the other loaders and (c) use custom inputs when doing those calls so it is able to eg change the parent URL or conditions list for the subcall and use that information as appropriate. So the explicit resolver model has a lot more power and flexibility. It sounds like you're proposing a hook model something like: export function resolve (specifier, context) {
} where This removes a lot of information - subsequent loaders don't know the original specifier. Also if the Node.js resolver always runs last this completely removes the ability to support custom schemes as the Node.js resolver throws for custom schemes.
Not at all - an instrumentation loader that is first in the chain can simply do: export function resolve (specifier, context, nextResolve) {
const resolved = await nextResolve(specifier, context);
doLogOperation(resolved);
return resolved;
} It would be a requirement of such an instrumentation loader that it comes first in the chain though yes.
Again this is simply not the case, because you can all the next chain with the new specifier, exactly because of the extra control. Say the loader that will load Preact's development build for any specifier referencing export function resolve (specifier, context, nextResolve) {
if (specifier === 'react') specifier = 'preact';
return nextResolve(specifier, context, nextResolve);
} The delegation model supports chaining completely the same.
Agreed the design process here happening in the PR is very broken and that a shared collaboration would have been a better approach, but we do the best with what we have given each of our own limited resources to contribute. |
I would like to note I am not convinced that we should commit to either delegation pattern as a final design even if we ship one currently.
I am not convinced that power and flexibility are the main goals here. The most power and flexibility would be to try and expose the entire loader chain to application code, but it comes at costs. Having chattier or re-entrant hooks does mean more computation and GC for those more flexible patterns. It also means that any given loader cannot be assured that it was given data in a controlled manner as it may come for a different loader that a user intended to have less authority than the one above the current loader. As I've stated above (a while ago alas), doing this grouping would remove any authority model between hooks and leave hooks as a cluster of equal authority. It goes even further though, that having any loader able to call any other loader instead forms a graph with an entry point instead of a chain effectively and the re-entrancy allows things like a loader to do timing checks on loaders closer to the entry point.
Yes, I actually see this as a boon.
This is not true with the delegation model that we currently with Overall I think both are valid and both have vastly different trade-offs. I do not see one as having vastly superior feature sets though as the trade offs of those features are visible as long as we think of individual loaders as having some kind of authority. If we didn't see loaders as having some kind of authority / ability to preempt others that would likely lean towards the grouping styles of being able to call anything from anywhere, but doing so gets complicated as loaders need to know the next loader to call in a more intimate and fragile manner from what I've experiences in places like webpack. |
@devsnek I'm sorry I haven't been able to devote the time to this that it deserves, but I hope to soon; but I need to follow up with #33416 first. In the meantime, it doesn't seem like all the concerns that have already been raised on this thread have been addressed, at least in comments; in particular @DerekNonGeneric's feedback in #33812 (comment). Since this thread is becoming unmanageable I almost wonder if we should open a new thread in the modules repo where you can propose what the API should be and we can hash out any concerns there? Separate from the implementation. |
I think we should be careful about not moving goalposts too much here. There's nothing wrong with raising concerns and wanting to have a discussion. But after more than a week where there's a lack of clarity about which specific concerns need to still be addressed, we need to be able to move forward eventually. Especially when you say yourself that this wouldn't even be the next PR you'd look into. Are there specific issues that would be impossible to address in future PRs that should be worked out before landing this? And if so, do you have at least a rough timeline for when you'd be able to comment on them? @devsnek has been doing a lot of work explaining his thoughts behind the APIs and decisions during the lifetime of this PR. I don't think it's quite fair to say "can @devsnek please put in even more work to repeat everything he said above" - especially combined with a "and then I may potentially one day lift my request for changes but I can't say when" ;). |
I'm pretty sure Guy has addressed my concerns in his comments. I think @DerekNonGeneric's last comment in #33812 (comment) has some concerns that should be addressed, either via code changes or in an explanation for why the PR is fine as is. That can happen now, and once that's done if I can't find time for my own review I'll release my block. But I'm still hoping to give this a proper review soon. |
@DerekNonGeneric some prose was added to the docs about hook order and such, can you let me know if it's sufficient? |
// Defer to Node.js for all other specifiers. | ||
return defaultResolve(specifier, context, defaultResolve); | ||
// Defer to the next loader for all other specifiers. | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this function doesn't return anything? The same as return null
?
if (Math.random() > 0.5) { // Some condition. | ||
// For some or all URLs, do some custom logic for modifying the source. | ||
// Always return an object of the form {source: <string|buffer>}. | ||
return { | ||
source: '...', | ||
}; | ||
} | ||
// Defer to Node.js for all other sources. | ||
return defaultTransformSource(source, context, defaultTransformSource); | ||
return { source }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not return null
?
I would think that since loaders need not define all available hooks, a hook that doesn't return anything would produce the same result as a hook that returned null or that wasn't defined by the loader.
lib/internal/modules/esm/loader.js
Outdated
// !!! CRITICAL SECTION !!! | ||
// NO AWAIT OPS BETWEEN HERE AND SETTING JOB IN MODULE MAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why this rule is here? Ideally as a comment in the code explaining what bad things will happen if one were to disregard the warning and add an await
in the critical section.
lib/internal/process/esm_loader.js
Outdated
|
||
const importedLoaders = []; | ||
for (let i = 0; i < userLoaders.length; i += 1) { | ||
const ns = await loader.importLoader(userLoaders[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind renaming it to namespace
then, or adding a comment? I didn't guess what ns
meant.
This patch adds the ability to chain loaders together. Hooks still need auditing on the best way to behave in the context of chaining, and that will be addressed in future patches.
@devsnek, any way we can get a rebase? The docs have been updated now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping.
To summarize my concerns with this PR so that it can move forward:
If the PR can already do all of the above, that would be great! Then I think we just need some tests to cover these cases. The only new test I see added in this PR is |
I'm pretty sure it does all that, but my motivation here reached zero a while ago. |
This is stage 1 of what I'm calling "Project: Make Loaders Usable". In future stages I will be working on such fun things as:
context
parameter of loader hookscc @nodejs/modules-active-members
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes