Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to access loaders with "inline code" (vm.Script, eval, etc.) #116

Open
tolmasky opened this issue Oct 28, 2022 · 10 comments
Open

Any way to access loaders with "inline code" (vm.Script, eval, etc.) #116

tolmasky opened this issue Oct 28, 2022 · 10 comments

Comments

@tolmasky
Copy link

If you have serialized "content" of a specific loader format (say for example TypeScript, or whatever), it would be nice to be able to use the existing loader functionality to be able to evaluate it "inline". One might for example expect that vm.Script, given that it allows you specify a filename option, might automatically choose the right loader:

// Imagine this was run with --experimental-loader typescript-loader.mjs
vm.Script("const x: Number = 5;", { filename: "test.ts" });

As far as I'm aware though, there is no way to do this currently (please correct me if I am mistaken!). I tried seeing if I could call load() directly to go through the loading machinery and just passing in the source property, but I also couldn't find an API to do that:

doLoad({ source: "some typescript...", url: "blah.ts" });
@aduh95
Copy link
Contributor

aduh95 commented Oct 28, 2022

I think the loader itself need to export some function, as the loaders API only deals with transforming URLs to a source string – or you could do import('data:text/typescript,some%20typescript...') if the loader supports data: URLs.

@tolmasky
Copy link
Author

tolmasky commented Oct 28, 2022

Right, but that forces one to have to replicate the entire machinery. The vm.Script example above displays why this would be frustrating, node already knows the right transpilation for ".ts" files, yet I'd have to hope the TypeScript loader provider took the care to provide a separate function, and then I'd have to apply it manually. The mime-type data URL you provided I think does a great job of highlighting the core problem and proposing an excellent solution. The current system currently couples two largely unrelated tasks: fetching (as in "loading from storage") with evaluation (as in "loading into the runtime"). It is weird that it should be the TypeScript loader's job to itself handle data URLs. It would seem that there should be a composable system where a single data URL "fetching" loader merely worries about taking a data URL and turning it into a { mimeType, source } structure that can then be handled by an unrelated TypeScript (or whatever else) source string "evaluating" loader.

As a thought experiment, imagine that instead of current resolve() and load(), we instead had fetch and evaluate. The job of the first is, given an import specifier, grab the contents and determine the mime type, and return the contents and a mimeType. It can also of course simply modify the specifier and pass it off to the nextFetch the way resolve can pass on to the nextResolve.

The evaluate loader then registers itself for a mime type (evaluate no longer needs to concern itself at all with determining if it itself should handle the content in question). It also is "storage type independent". At this phase, it makes no difference whether the contents came from a data URL, base64, filesystem, or network request. All the TypeScript transpiler needs to do is transpile, then pass off to the nextEvaluate with the mimeType changed to "text/javascript":

typescript-loader.js

import { compile } from "typescript-compiler";

// This module has no need for a `resolve` function at all. HTTP/HTTPS requests that return
// mime-type: text/typescript work out of the box, data URLs that specify text/typescript work
// out of the box, and files with the file extension .ts work out of the box. If anyone ever
// implements a different storage loader (like ftp:// or something), that will also just work.
export const MimeType = "text/typescript";
export const FileExtensions = ["ts"];

export default function evaluate({ mimeType, source }, nextEvaluate) {
    nextEvaluate({ mimeType:"text/javascript", source: compile(source) });
}

Similarly, the https loader example from the docs now makes more sense in my opinion:

https-loader.js

const isHTTPS = specifier => specifier && .startsWith('https://');

export function fetch(specifier, context, nextFetch) {
  const { parentURL = null } = context;
  const HTTPSURL =
    isHTTPS(specifier) ? new URL(specifier) :
    isHTTPS(parentURL) ? new URL(specifier, parentURL) :
   false;

  if (!HTTPSURL)
    return nextFetch(specifier);

  // Normally Node.js would error on specifiers starting with 'https://', but here
  // we can load the data ourselves.
  return new Promise((resolve, reject) => {
      get(url, (res) => {
        let data = '';
        res.on('data', chunk => data += chunk);
        res.on('end', () => resolve({
          mimeType: res.getHeader("Content-Type"),
          shortCircuit: true,
          source: data,
        }));
      }).on('error', (err) => reject(err));
    });
  }
}

As you can see, the HTTPS loader now only needs one function, and doesn't have to make the assumption about the format of the returned code. Similarly, a future CoffeeScript transpiler doesn't have to do an extreneous early recursive call to nextLoad that relies on that incorrectly hinting that the code is module, just for you to ignore it and transpile afterward.

Just to demonstrate parity with the current API's abilities, a hypothetical "github:"
loader still only needs to modify the specifier and pass on to the next loader:

github-loader.js

export function fetch(specifier, context, nextFetch) {
  
  if (!specifier.startWith("github:"))
    return nextFetch(specifier, context);

  return nextFetch(`https://github.com/${specifier}`, context);
}

This now allows you to load from github:nodejs/node/some/file.ts, and will compose fine with the typescript compiler, with neither needing to know anyting about eachother, and both only needing to implement one function.

Back to the original issue though, this now provides a good foundation to have "loader" support (which is a bit of a loaded term since loaders do everything from loading to transpiling) anywhere code might show up. Whether it be in vm.Script(), or even if you wish to parse markdown and compile code in code blocks that specify the language type in the code fence.

@tolmasky
Copy link
Author

A quick note: I would not necessarily call the two functions fetch and evaluate, I just chose those names to not make it confusing if I reused load for example. The basic Idea is that there is something that "gets the data", and then something that "does stuff to the data".

@GeoffreyBooth
Copy link
Member

The basic Idea is that there is something that “gets the data”, and then something that “does stuff to the data”.

An earlier version of the Loaders API had four hooks: resolve, getFormat, load and transform. These got collapsed into just resolve and load, with the responsibility for determining the format left ambiguous (it could be decided by either hook, so long as load returns it) and transformations, such as transpiling TypeScript, happening within load.

We did this to make chaining more practical. If there are separate load and transform hooks, for example, a transpilation loader could do its work in either one; and then all the other loaders in the chain would need to know what to expect from each hook. Collapsing them into one hook makes chaining simpler and less prone to error, and relying less on community-determined standards and expectations.

We’re approaching stable status for the API, so reevaluating the hooks is not something that I would welcome at this point. Refactoring them into fetch—really resolve plus load—and evaluate, which is really transform, would weight the API far in the direction that it’s geared toward the transpilation use case at the expense of all others. There are lots of use cases for loaders that just do resolution, like the example HTTPS loader in the docs, and it’s unnecessary for all of those loaders to need to handle data retrieval in addition to rewriting resolution strings.

All this said, your original question of having a way for loaders to apply to vm.Script and similar is a valid one. At the moment vm.Script means CommonJS and there’s a separate vm.Module for ESM code, so while loaders are ESM-only they would apply only to the latter. I think in the short term the easiest way to achieve this is if there were an API that exposed the possibly-loaders-enhanced resolve and load hooks for use anywhere; like import { resolve, load } from 'module' or somesuch, that would allow passing in input from user code to get output as it would have come from the loaders workflow. You could use such APIs to transform your strings before passing them into vm.Script or vm.Module. Some of the items under https://github.com/nodejs/loaders#milestone-2-usability-improvements point at an API such as this.

@tolmasky
Copy link
Author

tolmasky commented Oct 28, 2022

Hi @GeoffreyBooth, thanks for the response and the context! If the reasoning is around not wanting to change the API further, then I totally get that. However, I am wondering if you looked at all of the examples I provided -- the https loader in my system is (I think?) quite simpler and easier to follow than the one in the current docs (linked here again for comparison: https://nodejs.org/api/esm.html#https-loader).

I only bring this up because I don't think this system is weighted towards transpiling at all, and leaves resolution equally simple for specifier/url changes, and more ergonomic for "just download" cases. If you look at the https loader in my example, it is of course shorter than the one in the node docs, but more importantly, requires implementing only one function (which I think makes way more sense, vs. having to implement resolve just to early exit and then do the real work in load), and most importantly automatically composes with anyone else's transpiler loaders. In contrast, the https example in the Node docs is forced to bring up the fact that it's not clear what to do if the downloaded data isn't an ESM module, probably because the solution is too complicated in an already 50 line example.

I provided the 3 line "github" resolver example in my post to show how you still get super simple resolution with the "fetch/evaluate" system (to reiterate here, fetch doesn't force you to actually do the downloading if you just need to change the specifier/url, you can still send that off to the nextResolver). This one is short enough to just paste back in here:

export function fetch(specifier, context, nextFetch) {
  if (!specifier.startWith("github:"))
    return nextFetch(specifier, context);

  return nextFetch(`https://github.com/${specifier}`, context);
}

I think the two big ideas are that 1) to your point, neither use case should be favored: and thus in most instances they should both require implementing just one function. And 2) we should adopt the existing mime-type standard for handling data, that we'd get for free from any sort of loader that interacts with the internet, as opposed to having an implicit system that relies on the code having to infer the type from the filesystem extension in both methods. Anyways, again, if the API is done then no worries, just wanted to make sure my concerns and explanation was clear, and also hopefully point out that this system isn't meant to in any way target transpilation at the expense of resolution -- I think both are simpler here, and their respective roles are clearler (as opposed to both handling identification for example).

@laverdet
Copy link

laverdet commented Jul 4, 2023

@tolmasky resolution and fetching are separate though. You need to resolve a specifier into a URL which allows returning the same module binding in the case multiple modules import the same module using different specifiers. Imagine:

main.js

import "https://example.com/foo.js";
import "https://example.com/etc.js";

https://example.com/foo.js:

import "./etc.js";

In this case the specifiers "https://example.com/etc.js" and "./etc.js" resolve to the same URL. You wouldn't want to load two instances of etc.js.

I do 100% agree we need a way to programmatically access the loader API. I'm working on a hot reloadable loader right now. Since nodejs caches modules forever eventually we run out of memory. I was hoping to move to vm but then I don't have any way to access other loaders in the chain.

@privatenumber
Copy link
Contributor

privatenumber commented Nov 17, 2023

If REPL support is included in Milestone 3, I think it would also makes sense to include support for node --eval (& --print) as well. Which ideally would also extend to eval() calls.

@GeoffreyBooth
Copy link
Member

Do you want to open an issue to update the readme?

I don't think --print is feasible because we can't add the equivalent of a console.log wrapper around ESM code because import statements must be top level. But the rest should be doable.

@privatenumber
Copy link
Contributor

Assuming you meant PR, I opened one in #174

RE: ESM in --print

I'm not sure how --print works, but is it really as simple as wrapping console.log around the code? If so, how does it accept if-statements like this: node --print 'if (true) { 2 }'

Since it's seems to accept statements, I'm curious what's preventing ESM import statements from working?

@GeoffreyBooth
Copy link
Member

Since it's seems to accept statements, I'm curious what's preventing ESM import statements from working?

I don't know, I've never looked into it. I assumed the current --print somehow took advantage of the CommonJS wrapper but you're right, it must be something else. I assumed there was a reason ESM has never been supported in --print but maybe there isn't. That would need to happen first before hooks could affect it. If you can figure it out, pull requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants