Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow artifacts (or maybe runfiles?) to be marked non-symlinkable #10299

Open
jmillikin opened this issue Nov 24, 2019 · 14 comments
Open

Allow artifacts (or maybe runfiles?) to be marked non-symlinkable #10299

jmillikin opened this issue Nov 24, 2019 · 14 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request

Comments

@jmillikin
Copy link
Contributor

Description of the problem / feature request:

Node.js, a popular implementation of the MySpace theming language as a general-purpose programming environment, uses symlinks as part of its import resolution logic. This doesn't compose well with Bazel's use of symlinks in the execroot, especially the runfiles symlink forest.

It would be nice if certain parts of the runfiles tree could be marked "non-symlinkable", so that they always get written as regular files. I'm not sure whether this would be better at the File or runfiles level. I'm using library runfiles so the two are equivalent in my case.

Feature requests: what underlying problem are you trying to solve with this feature?

Given a runfiles tree containing a crafted node_modules/ directory, I'd like to have some files in there be symlinks (like today), and some be regular files. This would allow fine-grained control over the Node.js import resolver.

@laurentlb laurentlb added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed untriaged labels Dec 16, 2019
@flokli
Copy link

flokli commented Feb 14, 2020

I'm having a similar problem with python binaries. I have some packaging process outside bazel, which basically packages things from bazel-out.

Of course, I can dereference symlinks before packaging, still, I find it quite scary these files are just symlinks, and would have hoped bazel would copy things during build time to ensure hermiticity of the artifacts.

@brandjon
Copy link
Member

In general, the idea of copying files rather than symlinking them sounds pretty dangerous for performance. You could easily explode your disk usage with a quadratic number of copies.

We do have a mode where instead of emitting symlinks, we emit just a manifest file that points back into the source and output trees. The language-specific runfiles libraries abstract over that so you can write scripts that are agnostic to how the runfiles are laid out. Maybe something there can be extended for the purpose you're describing.

@brandjon brandjon added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Build-Language and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Starlark labels Feb 19, 2021
@jmillikin
Copy link
Contributor Author

In general, the idea of copying files rather than symlinking them sounds pretty dangerous for performance. You could easily explode your disk usage with a quadratic number of copies.

My use case is for writing stub entry points for js_binary targets, which are typically a few kilobytes at most. I think that it would be reasonable to allow users to make the tradeoff.

@brandjon
Copy link
Member

Depending on how this feature were designed, there could still be a hazard whereby a downstream target aggregates your js_binary targets, copying those files every step of the way. I'm not saying your use case isn't benign, but we're generally wary of features that can easily lead to unscalable builds. Just something to keep in mind if this feature is pursued.

@pauldraper
Copy link
Contributor

In general, the idea of copying files rather than symlinking them sounds pretty dangerous for performance. You could easily explode your disk usage with a quadratic number of copies.

There is hardlinking, but that only works if output root and source root are on the same device, and it doesn't work for directories.

@dgoldstein0
Copy link

I am also interested in this feature (and similar features), for similar reasons as the original request. I work on JS build tooling with a few other folks at my company, and we support >100 frontend JS developers.

The general problem is that the JS ecosystem only occasionally acknowledges the existence of symlinks, and gets all the defaults wrong. E.g. the node.js default is to follow symlinks, and then resolve require("../foo") and import "../foo" relative to the destination of the symlink instead of relative to the source; this can be worked around via --preserve-symlinks and --preserve-symlinks-main, but not all users remember to use that, to varying results. webpack, typescript, jest, babel, rollup, etc. all make similar assumptions, but the real impossibility is the enormous plugin ecosystems and huge long tail of less-popular packages that may not even offer options to change symlink behavior. Usually the result is things that just work for the ecosystem take hours or days to figure out all the issues in bazel, either if we go down the route of "patch all the logic that handles symlinks differently than we want" or "find all the symlink-related options and flip them on". It's just not sustainable for us to keep up with the variety of tooling our internal customers want to use, and they aren't willing to spend the time on solving these problems in bazel - preferring to avoid bazel instead.

my last straw recently - I was debugging a build of storybook in bazel, which was using @storybook/addon-docs, which itself uses babel-loader, which was mis-resolving the @babel/preset-env to follow outside of our symlinks and then crashing because it needed the symlinked version of a dependency and had escaped the runfiles tree and so couldn't find it. A possible solution was to change the custom storybook preset which had used @storybook/addon-docs to pass babelOptions so that babel-loader could use presets: [require.resolve("@babel/preset-env")] but that took a few hours to figure out, and then I rage-quit and just added a cp -RL step when the next error was some even more inscrutable issue where fs couldn't be found by webpack (which is directly used by storybook). the copy took 10 seconds of extra build time in this case, but that's better than extra days of my time.

anyhow some morals of that story:

  • the complexity of the ecosystem doing symlinks wrongs compounds with nested dependencies on these things
  • I prefer something that works but is a little slow if it means I get to deliver value to my users in minutes instead of days or weeks. with the latter, we cannot hope to keep up with the JS ecosystem.

anyhow we're likely going to figure out working copying into our system, whether it becomes a core bazel feature or not. the dev velocity of the alternative has proven over the last ~5.5 years to be too big of a handicap.

@pauldraper
Copy link
Contributor

pauldraper commented Jan 18, 2023

For JS, I created and use http://github.com/rivethealth/rules_javascript which shims require.resolve, ESM, and fs module to behave nicely with symlinks.

https://github.com/rivethealth/rules_javascript/blob/main/docs/module.md#solution

@pauldraper
Copy link
Contributor

pauldraper commented Jan 18, 2023

Note that copying can be achieved today by using a rule that outputs a copied directory/files.

Bazel will still create symlinks (or a manifest) for runfiles, but you can resolve it to the bazel-out location created by that rule and everything will work.

@dgoldstein0
Copy link

copying to bazel-out can help. but wouldn't it then be possible for e.g. require() to find files that wouldn't be in the runfiles?

e.g. protobufjs has some code like try{long = require("long")}catch(e){} if my memory serves - where it effectively makes a dependency optional, and then if I'm trying to opt out it can be important that it doesn't find it.

@pauldraper
Copy link
Contributor

It's always possible to find outside files (unless it's being used in a sandboxed build, but even then situation doesn't change).

@pauldraper
Copy link
Contributor

pauldraper commented Jan 20, 2023

My suggestion is create a rule

some_bin(
  name = "bin0",
)

copy_bin(
  name = "bin",
  bin = ":bin0",
)

copy_bin copies the entire runfile tree. And resolves the executable location to bazel-out.

It's not efficient, but it's not really particularly less efficient than the proposal of building it into Bazel (and it could still use hard links, if that's an option you want).

@mvgijssel
Copy link

This would fix this particular issue as well encode/starlette#1083 (comment) where the used Python library is unable to deal with symlinks in the runfiles directory (accurately).

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Oct 26, 2024
@jmillikin
Copy link
Contributor Author

I continue to hope for this feature every time I debug JavaScript rules.

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: feature request
Projects
None yet
Development

No branches or pull requests

9 participants