Skip to content

Commit

Permalink
feat(builtin): introduce a linker
Browse files Browse the repository at this point in the history
This is a separate process we run right before node programs.
Its job is to create symlinks so that the node_modules tree exists and
has all the packages we might want to load at runtime.

This will eventually replace the need for a custom module resolver,
custom typescript path mappings, and the existing module_mappings.bzl
support for TS/runtime mappings.
  • Loading branch information
alexeagle committed Sep 4, 2019
1 parent 01e3a55 commit 62037c9
Show file tree
Hide file tree
Showing 13 changed files with 504 additions and 0 deletions.
1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ npm_package(
"//internal/http-server:package_contents",
"//internal/jasmine_node_test:package_contents",
"//internal/js_library:package_contents",
"//internal/linker:package_contents",
# TODO(alexeagle): distribute separately as @bazel/rollup
"//internal/rollup:package_contents",
"//internal/node:package_contents",
Expand Down
12 changes: 12 additions & 0 deletions internal/linker/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
exports_files(["link_node_modules.js"])

filegroup(
name = "package_contents",
srcs = glob([
"*.bzl",
"*.js",
]) + [
"BUILD.bazel",
],
visibility = ["//:__pkg__"],
)
17 changes: 17 additions & 0 deletions internal/linker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# node package linker

It's not obvious why a "linker" is needed in nodejs.
After all, programs use dynamic lookups at runtime so we expect no need for static linking.

However, in the monorepo case, you develop a package and also reference it by name in the same repo.
This means you need a workflow like `npm link` to symlink the package from the `node_modules/name` directory to `packages/name` or wherever the sources live.
[lerna] does a similar thing, but at a wider scale: it links together a bunch of packages using a descriptor file to understand how to map from the source tree to the runtime locations.

Under Bazel, we have exactly this monorepo feature. But, we want users to have a better experience than lerna: they shouldn't need to run any tool other than `bazel test` or `bazel run` and they expect programs to work, even when they `require()` some local package from the monorepo.

To make this seamless, we run a linker as a separate program inside the Bazel action, right before node.
It does essentially the same job as Lerna: make sure there is a `$PWD/node_modules` tree and that all the semantics from Bazel (such as `module_name`/`module_root` attributes) are mapped to the node module resolution algorithm, so that the node runtime behaves the same way as if the packages had been installed from npm.

In the future the linker should also generate `package.json` files so that things like `main` and `typings` fields are present and reflect the Bazel semantics, so that we can entirely eliminate custom loading and pathmapping logic from binaries we execute.

[lerna]: https://github.com/lerna/lerna
137 changes: 137 additions & 0 deletions internal/linker/link_node_modules.bzl
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
"""Helper function and aspect to collect first-party packages.
These are used in node rules to link the node_modules before launching a program.
This supports path re-mapping, to support short module names.
See pathMapping doc: https://github.com/Microsoft/TypeScript/issues/5039
This reads the module_root and module_name attributes from rules in
the transitive closure, rolling these up to provide a mapping to the
linker, which uses the mappings to link a node_modules directory for
runtimes to locate all the first-party packages.
"""

load("@build_bazel_rules_nodejs//internal/common:node_module_info.bzl", "NodeModuleSources")

def _debug(vars, *args):
if "VERBOSE_LOGS" in vars.keys():
print("[link_node_modules.bzl]", *args)

# Arbitrary name; must be chosen to globally avoid conflicts with any other aspect
_ASPECT_RESULT_NAME = "link_node_modules__aspect_result"

# Traverse 'srcs' in addition so that we can go across a genrule
_MODULE_MAPPINGS_DEPS_NAMES = ["deps", "srcs"]

def register_node_modules_linker(ctx, args, inputs):
"""Helps an action to run node by setting up the node_modules linker as a pre-process
Args:
ctx: Bazel's starlark execution context, used to get attributes and actions
args: Arguments being passed to the program; a linker argument will be appended
inputs: inputs being passed to the program; a linker input will be appended
"""

mappings = {}
node_modules_root = ""

# Look through data/deps attributes to find...
for dep in getattr(ctx.attr, "data", []) + getattr(ctx.attr, "deps", []):
# ...the root directory for the third-party node_modules; we'll symlink the local "node_modules" to it
if NodeModuleSources in dep:
possible_root = "/".join([dep[NodeModuleSources].workspace, "node_modules"])
if not node_modules_root:
node_modules_root = possible_root
elif node_modules_root != possible_root:
fail("All npm dependencies need to come from a single workspace. Found '%s' and '%s'." % (node_modules_root, possible_root))

# ...first-party packages to be linked into the node_modules tree
for k, v in getattr(dep, _ASPECT_RESULT_NAME, {}).items():
if k in mappings and mappings[k] != v:
fail(("conflicting module mapping at %s: %s maps to both %s and %s" %
(dep.label, k, mappings[k], v)), "deps")
_debug(ctx.var, "Linking %s: %s" % (k, v))
mappings[k] = v

# Write the result to a file, and use the magic node option --bazel_node_modules_manifest
# The node_launcher.sh will peel off this argument and pass it to the linker rather than the program.
modules_manifest = ctx.actions.declare_file("_%s.module_mappings.json" % ctx.label.name)
ctx.actions.write(modules_manifest, str({"modules": mappings, "root": node_modules_root}))
args.add("--bazel_node_modules_manifest=%s" % modules_manifest.path)
inputs.append(modules_manifest)

def get_module_mappings(label, attrs, vars, srcs = [], workspace_name = None):
"""Returns the module_mappings from the given attrs.
Collects a {module_name - module_root} hash from all transitive dependencies,
checking for collisions. If a module has a non-empty `module_root` attribute,
all sources underneath it are treated as if they were rooted at a folder
`module_name`.
Args:
label: label
attrs: attributes
srcs: sources (defaults to [])
workspace_name: workspace name (defaults to None)
Returns:
The module mappings
"""
mappings = {}

for name in _MODULE_MAPPINGS_DEPS_NAMES:
for dep in getattr(attrs, name, []):
for k, v in getattr(dep, _ASPECT_RESULT_NAME, {}).items():
if k in mappings and mappings[k] != v:
fail(("duplicate module mapping at %s: %s maps to both %s and %s" %
(label, k, mappings[k], v)), "deps")
_debug(vars, "target %s propagating module mapping %s: %s" % (dep, k, v))
mappings[k] = v

if not getattr(attrs, "module_name", None) and not getattr(attrs, "module_root", None):
# No mappings contributed here, short-circuit with the transitive ones we collected
_debug(vars, "No module_name or module_root attr for", label)
return mappings

mn = getattr(attrs, "module_name", label.name)
mr = label.package

if workspace_name:
mr = "%s/%s" % (workspace_name, mr)
elif label.workspace_root:
mr = "%s/%s" % (label.workspace_root, mr)

if mn in mappings and mappings[mn] != mr:
fail(("duplicate module mapping at %s: %s maps to both %s and %s" %
(label, mn, mappings[mn], mr)), "deps")
_debug(vars, "target %s adding module mapping %s: %s" % (label, mn, mr))
mappings[mn] = mr
return mappings

# When building a mapping for use at runtime, we need paths to be relative to
# the runfiles directory. This requires the workspace_name to be prefixed on
# each module root.
def _module_mappings_aspect_impl(target, ctx):
if target.label.workspace_root:
# We need the workspace_name for the target being visited.
# Skylark doesn't have this - instead they have a workspace_root
# which looks like "external/repo_name" - so grab the second path segment.
# TODO(alexeagle): investigate a better way to get the workspace name
workspace_name = target.label.workspace_root.split("/")[1]
else:
workspace_name = ctx.workspace_name

# Use a dictionary to construct the result struct
# so that we can reference the _ASPECT_RESULT_NAME variable
return struct(**{
_ASPECT_RESULT_NAME: get_module_mappings(
target.label,
ctx.rule.attr,
ctx.var,
workspace_name = workspace_name,
),
})

module_mappings_aspect = aspect(
_module_mappings_aspect_impl,
attr_aspects = _MODULE_MAPPINGS_DEPS_NAMES,
)
156 changes: 156 additions & 0 deletions internal/linker/link_node_modules.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
/**
* @fileoverview Creates a node_modules directory in the current working directory
* and symlinks in the node modules needed to run a program.
* This replaces the need for custom module resolution logic inside the process.
*/
const fs = require('fs');
const path = require('path');

const VERBOSE_LOGS = !!process.env['VERBOSE_LOGS'];

function log_verbose(...m) {
// This is a template file so we use __filename to output the actual filename
if (VERBOSE_LOGS) console.error('[link_node_modules.js]', ...m);
}

function symlink(target, path) {
if (fs.existsSync(path)) {
// We assume here that the path is already linked to the correct target.
// Could add some logic that asserts it here, but we want to avoid an extra
// filesystem access so we should only do it under some kind of strict mode.
return;
}
log_verbose(`symlink( ${path} -> ${target} )`);
// Use junction on Windows since symlinks require elevated permissions
// we only link to directories so junctions work for us.
fs.symlinkSync(target, path, 'junction');
}

/**
* The runfiles manifest maps from short_path
* https://docs.bazel.build/versions/master/skylark/lib/File.html#short_path
* to the actual location on disk where the file can be read.
*
* In a sandboxed execution, it does not exist. In that case, runfiles must be
* resolved from a symlink tree under the runfiles dir.
* See https://github.com/bazelbuild/bazel/issues/3726
*/
function loadRunfilesManifest(manifestPath) {
log_verbose(`using runfiles manifest ${manifestPath}`);

// Create the manifest and reverse manifest maps.
const runfilesEntries = new Map();
const input = fs.readFileSync(manifestPath, {encoding: 'utf-8'});

for (const line of input.split('\n')) {
if (!line) continue;
const [runfilesPath, realPath] = line.split(' ');
runfilesEntries.set(runfilesPath, realPath);
}

return runfilesEntries;
}

function lookupDirectory(dir, runfilesManifest) {
for (const [k, v] of runfilesManifest) {
// Entry looks like
// k: npm/node_modules/semver/LICENSE
// v: /path/to/external/npm/node_modules/semver/LICENSE
// calculate l = length(`/semver/LICENSE`)
if (k.startsWith(dir)) {
const l = k.length - dir.length;
return v.substring(0, v.length - l);
}
}
throw new Error(`Internal failure, please report an issue.
RunfilesManifest has no key for ${dir}
`);
}

/**
* Resolve a root directory string to the actual location on disk
* where node_modules was installed
* @param root a string like 'npm/node_modules'
*/
function resolveRoot(root, runfilesManifest) {
// create a node_modules directory if no root
// this will be the case if only first-party modules are installed
if (!root) {
log_verbose('no third-party packages; mkdir node_modules in ', process.cwd);
fs.mkdirSync('node_modules');
return 'node_modules';
}

// If we got a runfilesManifest map, look through it for a resolution
if (runfilesManifest) {
return lookupDirectory(root, runfilesManifest);
}

// Account for Bazel --legacy_external_runfiles
// which look like 'my_wksp/external/npm/node_modules'
if (fs.existsSync(path.join('external', root))) {
log_verbose('Found legacy_external_runfiles, switching root to', path.join('external', root));
return path.join('external', root);
}

// The repository should be layed out in the parent directory
// since bazel sets our working directory to the repository where the build is happening
return path.join('..', root);
}

function main(args, runfilesManifestPath) {
if (!args || args.length < 1)
throw new Error('link_node_modules.js requires one argument: modulesManifest path');

const [modulesManifest] = args;
let {root, modules} = JSON.parse(fs.readFileSync(modulesManifest));
modules = modules || {};
log_verbose(
'read module manifest, node_modules root is', root, 'with first-party packages', modules);

const runfilesManifest =
runfilesManifestPath ? loadRunfilesManifest(runfilesManifestPath) : undefined;
const rootDir = resolveRoot(root, runfilesManifest);
log_verbose('resolved root', root, 'to', rootDir);

// Create the execroot/my_wksp/node_modules directory that node will resolve from
symlink(rootDir, 'node_modules');

// Typically, cwd=foo, root=external/npm/node_modules, so we want links to be
// ../../../../foo/path/to/package
const symlinkRelativeTarget = path.relative(rootDir, '..');
process.chdir(rootDir);

// Now add symlinks to each of our first-party packages so they appear under the node_modules tree
for (const m of Object.keys(modules)) {
const target = runfilesManifest ? lookupDirectory(modules[m], runfilesManifest) :
path.join(symlinkRelativeTarget, modules[m]);
symlink(target, m);
}

return 0;
}

exports.main = main;

if (require.main === module) {
// If Bazel sets a variable pointing to a runfiles manifest,
// we'll always use it.
// Note that this has a slight performance implication on Mac/Linux
// where we could use the runfiles tree already laid out on disk
// but this just costs one file read for the external npm/node_modules
// and one for each first-party module, not one per file.
const runfilesManifestPath = process.env['RUNFILES_MANIFEST_FILE'];
// Under --noenable_runfiles (in particular on Windows)
// Bazel sets RUNFILES_MANIFEST_ONLY=1.
// When this happens, we need to read the manifest file to locate
// inputs
if (process.env['RUNFILES_MANIFEST_ONLY'] === '1' && !runfilesManifestPath) {
log_verbose(`Workaround https://github.com/bazelbuild/bazel/issues/7994
RUNFILES_MANIFEST_FILE should have been set but wasn't.
falling back to using runfiles symlinks.
If you want to test runfiles manifest behavior, add
--spawn_strategy=standalone to the command line.`);
}
process.exitCode = main(process.argv.slice(2), runfilesManifestPath);
}
7 changes: 7 additions & 0 deletions internal/linker/test/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
load("@npm_bazel_jasmine//:index.from_src.bzl", "jasmine_node_test")

jasmine_node_test(
name = "unit_tests",
srcs = glob(["*.js"]),
data = ["//internal/linker:link_node_modules.js"],
)
32 changes: 32 additions & 0 deletions internal/linker/test/integration/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
load(":rule.bzl", "linked")

linked(
name = "example",
deps = [
"//internal/linker/test/integration/pkg_a",
"@npm//semver",
],
)

# Use the node binary supplied by the bazel toolchain
genrule(
name = "replace_node_path",
srcs = [":test.sh"],
outs = ["test_with_node.sh"],
cmd = "sed s#NODE_PATH#$(NODE_PATH)# $< > $@",
toolchains = ["@build_bazel_rules_nodejs//toolchains/node:toolchain"],
)

sh_test(
name = "test",
srcs = ["test_with_node.sh"],
data = [
":example",
":program.js",
"//internal/linker:link_node_modules.js",
"@bazel_tools//tools/bash/runfiles",
"@build_bazel_rules_nodejs//toolchains/node:node_bin",
# TODO: we shouldn't need to repeat this here. There's a bug somewhere
"@npm//semver",
],
)
9 changes: 9 additions & 0 deletions internal/linker/test/integration/pkg_a/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
load("//internal/js_library:js_library.bzl", "js_library")

package(default_visibility = ["//internal/linker/test:__subpackages__"])

js_library(
name = "pkg_a",
srcs = ["index.js"],
module_name = "a",
)
5 changes: 5 additions & 0 deletions internal/linker/test/integration/pkg_a/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
function addA(str) {
return `${str}_a`;
}

exports.addA = addA;
6 changes: 6 additions & 0 deletions internal/linker/test/integration/program.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// First-party package from ./pkg_a
const a = require('a');
// Third-party package installed in the root node_modules
const semver = require('semver');

console.log(a.addA(semver.clean(' =v1.2.3 ')));
Loading

0 comments on commit 62037c9

Please sign in to comment.