-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow runfiles to be easily/efficiently passed as action inputs #15486
Comments
Either P2 or P4. It's a pain point for existing rules, that used this natively. I'm split between making things easier, but also maintaining a new thing that was so far not possible in Starlark and rules implemented from scratch in Starlark didn't need that. |
I am a little confused on the expected semantics, and also not entirely clear on the possibilities with FilesToRunProvider though #7187 makes it sound like the latter isn't really an option for starlark-written rules anyway. Anyhow I have a use case to throw into the mix here: rules for javascript bundlers (think webpack, rollup, esbuild, rspack, turbopack, etc.). JS bundlers are typically a lot more like linkers from other languages - take in lots of files, which could be compiled js (e.g. from typescript, jsx, etc.) or source files, and transform into either a single output or perhaps a directory (when code splitting is used to output multiple files); however unlike e.g. c++ linking, the filesystem layout of the inputs matters a lot. One of the big challenges of this use case today is that JS bundlers typically want to believe that all inputs - generated or not - are part of the same source tree - or else relative imports in them tend to not resolve correctly. Runfiles seem to be the right abstraction for this - they let us hide the difference between the source and generated files - making the File.short_path visible to the bundler rather than the full File.path. The problem: the rules I've written for js bundlers currently are constructing a binary of the bundler plus all the inputs, and then executing that. which is inefficient - every input change results in rebuilding this binary before doing the actual build - and also makes bazel think that all of our input code is actually part of the build tool (so bzl query --notool_deps gives the wrong answers). If there were some way to pass runfiles objects as inputs to the action, I suspect these rules would be more efficient? But at the very least, it'd let bazel query understand these are inputs and not tool dependencies. And it's probably a blocker for having a useful implementation of persistent workers for these tools. An alternative is copying all the source input files, but copying is noticeably slower than constructing symlinks. |
This topic came up again and I came up with a more concrete API proposal:
The I think this is about the most minimal API necessary. To do something useful, e.g. put all runfiles into a zip, you have to create a manifest file manually (there are efficient user space ways to do this using A couple other ideas that just occurred to me:
Or, model it as a TreeArtifact. Something like:
|
I like the tree artifact idea, as that answers the "how do we translate the runfiles to an equivalent directory structure on disk". Though I wonder how performant that'll be, it'll definitely solve the That said it's also plausible that we have two separate use cases here - possibly not all actions that want to consume runfiles need the runfiles organization of the files materialized in the filesystem? So we may want two different apis, rather than one. For the case of not materializing the filesystem directly, one option two new methods:
E.g. something like def _runfile_to_arg(runfilePath, file):
return "--path={}:{}".format(runfilePath, file.path)
...
args = ctx.actions.args()
args.add_runfiles(runfiles, format_each=_runfile_to_arg)
ctx.actions.run(inputs=[runfiles.get_all_files()], args=[args]) I think this most clearly competes to your manifest idea. Here you can put the manifest information into your Args object, whereas there, the manifest would presumably have a predetermined format and just be another file fed to the action. |
Yeah, given that runfiles are, essentially, representing a directory, using directory artifacts for them feels right. re: re: |
Thought of an alternative to/variation of So just have an adapter that unwraps them. e.g.
I don't know the exact NestedSet API, but I hope the gist is clear. |
Huh, those SymlinkEntry don't seem to be exposed to starlark yet, at least not that I could find. Are there plans for that? |
…2891) After #2887 I started finding similar linker errors in environments that supported runfiles and symlinks but have yet to see this issue in environments without them. I believe my understanding of how the [DefaultInfo.default_runfiles](https://bazel.build/rules/lib/providers/DefaultInfo#default_runfiles) objects worked and in the previous change had attempted to rely on it for a runfiles directory. However, there is no guarantee a directory will be rendered and no way to force it to be rendered for actions either (bazelbuild/bazel#15486). The consequence is that creating a custom runfiles directory is no longer a contingency and explicitly creating it is the correct thing to do to ensure `CARGO_MANIFEST_DIR` linker inputs are always available to the consuming actions. With the propagation of `CARGO_MANIFEST_DIR` (now an action output) into `Rustc` actions, a new flag is also introduced to prune unnecessary files out of the directory to reduce bloat and potential invalidation in downstream actions, `--@rules_rust//cargo/settings:cargo_manifest_dir_retain_list`. This flag accepts a list of path suffixes used to determine what (if anything) in `CARGO_MANIFEST_DIR` should be retained and passed to downstream actions. Note that the failure mode this addresses is hard to observe as it requires a crate which defines a linker path to something in `CARGO_MANIFEST_DIR` and then for a clean build to be done with a change to a dependent of said crate. If the runfiles directory for the build script is never rendered then there will be no file to link into the crate.
…2891) After #2887 I started finding similar linker errors in environments that supported runfiles and symlinks but have yet to see this issue in environments without them. I believe my understanding of how the [DefaultInfo.default_runfiles](https://bazel.build/rules/lib/providers/DefaultInfo#default_runfiles) objects worked and in the previous change had attempted to rely on it for a runfiles directory. However, there is no guarantee a directory will be rendered and no way to force it to be rendered for actions either (bazelbuild/bazel#15486). The consequence is that creating a custom runfiles directory is no longer a contingency and explicitly creating it is the correct thing to do to ensure `CARGO_MANIFEST_DIR` linker inputs are always available to the consuming actions. With the propagation of `CARGO_MANIFEST_DIR` (now an action output) into `Rustc` actions, a new flag is also introduced to prune unnecessary files out of the directory to reduce bloat and potential invalidation in downstream actions, `--@rules_rust//cargo/settings:cargo_manifest_dir_filename_suffixes_to_retain`. This flag accepts a list of path suffixes used to determine what (if anything) in `CARGO_MANIFEST_DIR` should be retained and passed to downstream actions. Note that the failure mode this addresses is hard to observe as it requires a crate which defines a linker path to something in `CARGO_MANIFEST_DIR` and then for a clean build to be done with a change to a dependent of said crate. If the runfiles directory for the build script is never rendered then there will be no file to link into the crate. --------- Co-authored-by: Daniel Wagner-Hall <dawagner@gmail.com>
…2891) After #2887 I started finding similar linker errors in environments that supported runfiles and symlinks but have yet to see this issue in environments without them. I believe my understanding of how the [DefaultInfo.default_runfiles](https://bazel.build/rules/lib/providers/DefaultInfo#default_runfiles) objects worked and in the previous change had attempted to rely on it for a runfiles directory. However, there is no guarantee a directory will be rendered and no way to force it to be rendered for actions either (bazelbuild/bazel#15486). The consequence is that creating a custom runfiles directory is no longer a contingency and explicitly creating it is the correct thing to do to ensure `CARGO_MANIFEST_DIR` linker inputs are always available to the consuming actions. With the propagation of `CARGO_MANIFEST_DIR` (now an action output) into `Rustc` actions, a new flag is also introduced to prune unnecessary files out of the directory to reduce bloat and potential invalidation in downstream actions, `--@rules_rust//cargo/settings:cargo_manifest_dir_filename_suffixes_to_retain`. This flag accepts a list of path suffixes used to determine what (if anything) in `CARGO_MANIFEST_DIR` should be retained and passed to downstream actions. Note that the failure mode this addresses is hard to observe as it requires a crate which defines a linker path to something in `CARGO_MANIFEST_DIR` and then for a clean build to be done with a change to a dependent of said crate. If the runfiles directory for the build script is never rendered then there will be no file to link into the crate. --------- Co-authored-by: Daniel Wagner-Hall <dawagner@gmail.com>
cc @lberki A function to convert a runfiles object to a file that can then be passed as inputs to an action would be very useful for this issue. |
With 78971a9, creating a Starlark function that takes a @comius @c-mita WDYT about making that an official part of the API? (I'm asking Charles because I think he'd have some use for this in deleting some Google-specific warts and in Starlarkifying Google-internal I'm thinking of something like
or splitting it up into the usual The complication is that these artifacts would be somewhat special, for example, building them wouldn't cause them to show up in the local file system, only on the inputs of the actions that depend on them (long story). That's eventually fixable, but it'd definitely be a wart. |
I'd like it much more if it was just like a tree artifact. :D But I can imagine a I'd prefer experimenting a bit with it behind the "internal API allowlisting" we've done for other things before making it "official" though; but I do have a use case or two in mind. |
It can't be a tree artifact because runfiles trees are materially different: tree artifacts have all the files within them and runfiles trees are only symlinks to other files that need to be "dragged along" into the inputs of the action. It's not inconceivable that they'll one day be unified, but I'm afraid I can't commit to that. re: API, fine, you get a |
Make it an allowlisted api. I think that runfiles need and overhaul, however I don’t have yet a good overview of the area. There were some nice suggestions @kmoffett shared with me over a chat. I’m not sure if they are materialized anywhere, but the essence of the idea is to split DefaultInfo into FilesInfo and RunfilesInfo. Separation should cause less confusion for the users. And it could allow more natural handling of runfiles within a rule. |
I won't take sides on what one should do with DefaultInfo (it's mostly orthogonal to whether the pertinent actions should be createable from Starlark), but what I'm hearing from both @c-mita and @comius is that an API to create abitrary symlink trees from Starlark would be useful. /me rolls up his sleeves... |
I missed that the comment about redundancy. Would the
Insert comment that rhymes with "let me kill |
Do I understand correctly that arbitrary symlink trees are enough to kill |
How For my goals, I need a way of producing a symlink tree as an "artifact" that can be transparently returned by a rule or passed to other actions. In my mind this looks as much like a tree artifact as possible (notably, "runfiles" do not look like a directory, but "fileset" artifacts do https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/actions/Artifact.java;l=682). Maybe this is just a naming issue and it should be |
My best educated guess is that The difference used to be meaningful when Filesets were much more complicated than today, but with all your cleanups, I think they are semantically pretty close to each other now, except that Let's do this: I drive the cleanup to the end that removes all the special-casing of Deal? |
Ideally, one would not make the distinction between tree artifacts and symlink trees either and instead be able to handle both symlinks and files in (I also have already done enough cleanups recently to not have the appetite for much more, but that's my personal issue) |
Description of the feature request:
If you have an action that needs to take a collection of runfiles as input, then it's very hard, error prone, and inefficient to correctly do this.
To correctly do this, the inputs passed to the action must include:
runfiles.files
.File
s withinrunfiles.symlinks
. Getting these requires flattening the depset during analysis.File
s withinrunfiles.root_symlinks
. Also requires flattening the depset during analysis.runfiles.symlinks
. These either must be passed as args to the action, or a second file must be created to store their values and then that file must be passed to the action.runfiles.root_symlinks
. Same problem as above.runfiles.empty_filenames
. Accessing this attribute triggers execution of the underlying RunfilesSupplier, which is normally delayed until execution time. Also same problem as with the symlink paths.The above is possible using pure-starlark, but is non-ideal because it has to flatten depsets (extract all the files and pass them as inputs; put the symlink paths and empty_filenames paths into Args objects).
Normally the answer to "i need runfiles passed to an action" is to use
tools
and pass FilesToRunProvider. However, that isn't always possible because:--stamp
behavior, embed build information, and trigger re-generation correctly, a target must pass all its runfiles et al to the action that generates that information; (b) for things like "build a zip file of myself" or some other packaging-esq type of thing.From looking around, I think Java-native rules basically deal with this problem by using SourceManifestAction or middleman files[1]. The both work about the same: create a file and write the runfiles as a manifest, then have that file be an input to another action. All that work gets deferred until execution, time, though, and all analysis phase pays is passing the runfiles and registering an action.
So, a couple half-baked ideas:
runfiles_inputs
arg toctx.actions.run
. This would accept a list of runfiles objects. Likeinputs
, changes to them would re-trigger action execution. Whether they are materialized, and how they are, I don't know.inputs
and andArgs
objects.(2) Seems better overall as it's more flexible. If you have a file, you can then pass it to
inputs
or add it toargs
.[1] Small aside: middleman files don't seem to play well with Starlark implemented rules. From what I could tell, all Starlark rules call some RunfilesSupport helper logic during their Java-phase, which registers various actions and outputs. So interacting with runfiles support in the middle of the Starlark phase of a rule tends to create an action/output conflict with what happens in the Java phase.
What underlying problem are you trying to solve with this feature?
Making it easy to correctly and efficiently pass runfiles information to actions.
Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?google build
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.google build
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?Have you found anything relevant by searching the web?
#15164 -- "i want to see the runfiles passed to an action"
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: