Allow runfiles to be easily/efficiently passed as action inputs #15486

rickeylev · 2022-05-12T19:45:08Z

Description of the feature request:

If you have an action that needs to take a collection of runfiles as input, then it's very hard, error prone, and inefficient to correctly do this.

To correctly do this, the inputs passed to the action must include:

runfiles.files.
The Files within runfiles.symlinks. Getting these requires flattening the depset during analysis.
The Files within runfiles.root_symlinks. Also requires flattening the depset during analysis.
The symlink path strings of runfiles.symlinks. These either must be passed as args to the action, or a second file must be created to store their values and then that file must be passed to the action.
The symlink path strings of runfiles.root_symlinks. Same problem as above.
The path strings of runfiles.empty_filenames. Accessing this attribute triggers execution of the underlying RunfilesSupplier, which is normally delayed until execution time. Also same problem as with the symlink paths.

The above is possible using pure-starlark, but is non-ideal because it has to flatten depsets (extract all the files and pass them as inputs; put the symlink paths and empty_filenames paths into Args objects).

Normally the answer to "i need runfiles passed to an action" is to use tools and pass FilesToRunProvider. However, that isn't always possible because:

FilesToRun can't be constructed by Starlark
Getting a reference to FilesToRun requires a target, i.e. a macro has to wire together multiple targets
(2) isn't always possible. Some examples: (a) if you want to properly implement --stamp behavior, embed build information, and trigger re-generation correctly, a target must pass all its runfiles et al to the action that generates that information; (b) for things like "build a zip file of myself" or some other packaging-esq type of thing.
FilesToRun is more expensive: it means "materialize this set of paths when the action is run", when the minimum necessary is "make sure any changes to the runfiles re-trigger the action execution" and/or "I just need the underlying inputs available and their symlink path-mapping" (e.g. a manifest)

From looking around, I think Java-native rules basically deal with this problem by using SourceManifestAction or middleman files[1]. The both work about the same: create a file and write the runfiles as a manifest, then have that file be an input to another action. All that work gets deferred until execution, time, though, and all analysis phase pays is passing the runfiles and registering an action.

So, a couple half-baked ideas:

Add a runfiles_inputs arg to ctx.actions.run. This would accept a list of runfiles objects. Like inputs, changes to them would re-trigger action execution. Whether they are materialized, and how they are, I don't know.
Expose some SourceManifestAction-equivalent, i.e. a way to correctly and efficiently transform a runfiles object into a file that can be passed to inputs and and Args objects.

(2) Seems better overall as it's more flexible. If you have a file, you can then pass it to inputs or add it to args.

[1] Small aside: middleman files don't seem to play well with Starlark implemented rules. From what I could tell, all Starlark rules call some RunfilesSupport helper logic during their Java-phase, which registers various actions and outputs. So interacting with runfiles support in the middle of the Starlark phase of a rule tends to create an action/output conflict with what happens in the Java phase.

What underlying problem are you trying to solve with this feature?

Making it easy to correctly and efficiently pass runfiles information to actions.

Which operating system are you running Bazel on?

Linux

What is the output of `bazel info release`?

google build

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

google build

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

n/a

Have you found anything relevant by searching the web?

#15164 -- "i want to see the runfiles passed to an action"

Any other information, logs, or outputs that you want to share?

No response

The text was updated successfully, but these errors were encountered:

comius · 2023-09-14T13:58:41Z

Either P2 or P4. It's a pain point for existing rules, that used this natively. I'm split between making things easier, but also maintaining a new thing that was so far not possible in Starlark and rules implemented from scratch in Starlark didn't need that.

dgoldstein0 · 2024-01-20T09:00:59Z

I am a little confused on the expected semantics, and also not entirely clear on the possibilities with FilesToRunProvider though #7187 makes it sound like the latter isn't really an option for starlark-written rules anyway. Anyhow I have a use case to throw into the mix here: rules for javascript bundlers (think webpack, rollup, esbuild, rspack, turbopack, etc.). JS bundlers are typically a lot more like linkers from other languages - take in lots of files, which could be compiled js (e.g. from typescript, jsx, etc.) or source files, and transform into either a single output or perhaps a directory (when code splitting is used to output multiple files); however unlike e.g. c++ linking, the filesystem layout of the inputs matters a lot. One of the big challenges of this use case today is that JS bundlers typically want to believe that all inputs - generated or not - are part of the same source tree - or else relative imports in them tend to not resolve correctly. Runfiles seem to be the right abstraction for this - they let us hide the difference between the source and generated files - making the File.short_path visible to the bundler rather than the full File.path.

The problem: the rules I've written for js bundlers currently are constructing a binary of the bundler plus all the inputs, and then executing that. which is inefficient - every input change results in rebuilding this binary before doing the actual build - and also makes bazel think that all of our input code is actually part of the build tool (so bzl query --notool_deps gives the wrong answers). If there were some way to pass runfiles objects as inputs to the action, I suspect these rules would be more efficient? But at the very least, it'd let bazel query understand these are inputs and not tool dependencies. And it's probably a blocker for having a useful implementation of persistent workers for these tools. An alternative is copying all the source input files, but copying is noticeably slower than constructing symlinks.

rickeylev · 2024-08-14T18:32:43Z

This topic came up again and I came up with a more concrete API proposal:

runfiles_input = ctx.runfiles_to_action_input(runfiles)
ctx.actions.run(inputs = [runfiles_input], ...)

The runfiles_to_action_input method converts the runfiles to some object that represents all the underlying File objects of the passed-in runfiles. Perhaps a special proxy object, or a custom File implementation based upon e.g. Middleman. Whatever it is, it efficiently ensures the underlying File objects within runfiles are made available as inputs to the action.

I think this is about the most minimal API necessary.

To do something useful, e.g. put all runfiles into a zip, you have to create a manifest file manually (there are efficient user space ways to do this using map_each).

A couple other ideas that just occurred to me:

manifest, runfile_input = ctx.blabla(runfiles) -- generate a manifest for you

Or, model it as a TreeArtifact. Something like:

runfiles_root = ctx.declare_directory("foo_runfiles_root")
ctx.runfiles_to_directory(runfiles, output=runfiles_root)
ctx.actions.run(inputs=[runfiles_root], arguments=[runfiles_root.path])

dgoldstein0 · 2024-08-24T19:38:54Z

I like the tree artifact idea, as that answers the "how do we translate the runfiles to an equivalent directory structure on disk". Though I wonder how performant that'll be, it'll definitely solve the --notool_deps problem with the "build and then execute a binary" strategy that makes the inputs look like part of the build tools.

That said it's also plausible that we have two separate use cases here - possibly not all actions that want to consume runfiles need the runfiles organization of the files materialized in the filesystem? So we may want two different apis, rather than one.

For the case of not materializing the filesystem directly, one option two new methods:

A new function runfiles.get_all_files() that would return all File objects inside the runfiles regardless of which of the fields of runfiles (files, symlinks, root_symlinks, or empty_files) each file comes from. (I suppose this isn't strictly necessary, but it's a pretty substantial simplification)
Args.add_runfiles() could give more flexibility in how runfiles are passed.

E.g. something like

def _runfile_to_arg(runfilePath, file):
  return "--path={}:{}".format(runfilePath, file.path)
...
args = ctx.actions.args()
args.add_runfiles(runfiles, format_each=_runfile_to_arg)
ctx.actions.run(inputs=[runfiles.get_all_files()], args=[args])

I think this most clearly competes to your manifest idea. Here you can put the manifest information into your Args object, whereas there, the manifest would presumably have a predetermined format and just be another file fed to the action.

rickeylev · 2024-08-25T17:19:06Z

Yeah, given that runfiles are, essentially, representing a directory, using directory artifacts for them feels right.

re: runfiles.get_all_files: Yeah, this would simplify. The trouble is that, under the hood, symlinks/root_symlinks are depsets of SymlinkEntry objects (basically a (path, File) pair). So to get just the File objects the depsets have to be flattened. Or flattening has to be deferred, somehow (which puts us back to "special file to represent runfiles")

re: runfiles.add_runfiles: Oh interesting idea. You can already use (add_all + map_each) to convert a runfiles object to e.g. --path=... args. But perhaps a separate function would be more efficient?

rickeylev · 2024-08-29T19:08:44Z

Thought of an alternative to/variation of runfiles.get_all_files. So under the hood, the problem is there's a depset of SymlinkEntry objects, which aren't File objects, but are wrappers around File objects.

So just have an adapter that unwraps them. e.g. runfiles.get_all_files() returns:

class SymlinkArtifactNestedSet implements NestedSet<Artifact> {
   constructor(NestedSet<SymlinkEntry> symlinks) { this.symlinks = symlinks }
   Artifact getNext() {
     symlinks.getNext().getArtifact();
   }
}

I don't know the exact NestedSet API, but I hope the gist is clear.

dgoldstein0 · 2024-08-30T05:31:23Z

Huh, those SymlinkEntry don't seem to be exposed to starlark yet, at least not that I could find. Are there plans for that?

…2891) After #2887 I started finding similar linker errors in environments that supported runfiles and symlinks but have yet to see this issue in environments without them. I believe my understanding of how the [DefaultInfo.default_runfiles](https://bazel.build/rules/lib/providers/DefaultInfo#default_runfiles) objects worked and in the previous change had attempted to rely on it for a runfiles directory. However, there is no guarantee a directory will be rendered and no way to force it to be rendered for actions either (bazelbuild/bazel#15486). The consequence is that creating a custom runfiles directory is no longer a contingency and explicitly creating it is the correct thing to do to ensure `CARGO_MANIFEST_DIR` linker inputs are always available to the consuming actions. With the propagation of `CARGO_MANIFEST_DIR` (now an action output) into `Rustc` actions, a new flag is also introduced to prune unnecessary files out of the directory to reduce bloat and potential invalidation in downstream actions, `--@rules_rust//cargo/settings:cargo_manifest_dir_retain_list`. This flag accepts a list of path suffixes used to determine what (if anything) in `CARGO_MANIFEST_DIR` should be retained and passed to downstream actions. Note that the failure mode this addresses is hard to observe as it requires a crate which defines a linker path to something in `CARGO_MANIFEST_DIR` and then for a clean build to be done with a change to a dependent of said crate. If the runfiles directory for the build script is never rendered then there will be no file to link into the crate.

…2891) After #2887 I started finding similar linker errors in environments that supported runfiles and symlinks but have yet to see this issue in environments without them. I believe my understanding of how the [DefaultInfo.default_runfiles](https://bazel.build/rules/lib/providers/DefaultInfo#default_runfiles) objects worked and in the previous change had attempted to rely on it for a runfiles directory. However, there is no guarantee a directory will be rendered and no way to force it to be rendered for actions either (bazelbuild/bazel#15486). The consequence is that creating a custom runfiles directory is no longer a contingency and explicitly creating it is the correct thing to do to ensure `CARGO_MANIFEST_DIR` linker inputs are always available to the consuming actions. With the propagation of `CARGO_MANIFEST_DIR` (now an action output) into `Rustc` actions, a new flag is also introduced to prune unnecessary files out of the directory to reduce bloat and potential invalidation in downstream actions, `--@rules_rust//cargo/settings:cargo_manifest_dir_filename_suffixes_to_retain`. This flag accepts a list of path suffixes used to determine what (if anything) in `CARGO_MANIFEST_DIR` should be retained and passed to downstream actions. Note that the failure mode this addresses is hard to observe as it requires a crate which defines a linker path to something in `CARGO_MANIFEST_DIR` and then for a clean build to be done with a change to a dependent of said crate. If the runfiles directory for the build script is never rendered then there will be no file to link into the crate. --------- Co-authored-by: Daniel Wagner-Hall <dawagner@gmail.com>

rickeylev · 2024-10-11T15:43:02Z

cc @lberki A function to convert a runfiles object to a file that can then be passed as inputs to an action would be very useful for this issue.

lberki · 2024-10-12T10:03:31Z

With 78971a9, creating a Starlark function that takes a Runfiles object and returns a File that is a runfiles tree containing those runfiles is very possible. In fact, I have a draft change that does that in support of some intricate Google-specific Python functionality and it works well.

@comius @c-mita WDYT about making that an official part of the API? (I'm asking Charles because I think he'd have some use for this in deleting some Google-specific warts and in Starlarkifying Google-internal sh_* rules)

I'm thinking of something like

ctx.actions.create_runfiles_tree(path, sibling, runfiles)

or splitting it up into the usual declare_* / create the action parts that we do for other kinds of files (it's superfluous here because there isn't any other way to create a runfiles tree other than from a Runfiles object but then it'd be symmetric)

The complication is that these artifacts would be somewhat special, for example, building them wouldn't cause them to show up in the local file system, only on the inputs of the actions that depend on them (long story). That's eventually fixable, but it'd definitely be a wart.

c-mita · 2024-10-22T11:07:42Z

I'd like it much more if it was just like a tree artifact. :D

But I can imagine a declare_runfiles_tree and create_runfiles_tree pair, with the hope that one day the former becomes synonymous with declare_directory.

I'd prefer experimenting a bit with it behind the "internal API allowlisting" we've done for other things before making it "official" though; but I do have a use case or two in mind.

lberki · 2024-10-22T11:15:56Z

It can't be a tree artifact because runfiles trees are materially different: tree artifacts have all the files within them and runfiles trees are only symlinks to other files that need to be "dragged along" into the inputs of the action. It's not inconceivable that they'll one day be unified, but I'm afraid I can't commit to that.

re: API, fine, you get a declare_* / create_* pair, provided that you pinkie promise that you'll use it them. Deal?

comius · 2024-10-22T11:16:17Z

Make it an allowlisted api. I think that runfiles need and overhaul, however I don’t have yet a good overview of the area.

There were some nice suggestions @kmoffett shared with me over a chat. I’m not sure if they are materialized anywhere, but the essence of the idea is to split DefaultInfo into FilesInfo and RunfilesInfo. Separation should cause less confusion for the users. And it could allow more natural handling of runfiles within a rule.

lberki · 2024-10-22T11:22:13Z

I won't take sides on what one should do with DefaultInfo (it's mostly orthogonal to whether the pertinent actions should be createable from Starlark), but what I'm hearing from both @c-mita and @comius is that an API to create abitrary symlink trees from Starlark would be useful.

/me rolls up his sleeves...

c-mita · 2024-10-22T11:29:36Z

re: API, fine, you get a declare_* / create_* pair, provided that you pinkie promise that you'll use it them. Deal?

I missed that the comment about redundancy. Would the declare really be a no-op?

an API to create abitrary symlink trees from Starlark would be useful

Insert comment that rhymes with "let me kill SpecialArtifact.FILESET"

lberki · 2024-10-22T12:14:08Z

Do I understand correctly that arbitrary symlink trees are enough to kill SpecialArtifact.FILESET i.e. you don't need the tree artifact / runfiles tree hybrid functionality that you mentioned above?

c-mita · 2024-10-28T15:41:20Z

Do I understand correctly that arbitrary symlink trees are enough to kill SpecialArtifact.FILESET i.e. you don't need the tree artifact / runfiles tree hybrid functionality that you mentioned above?

How SpecialArtifact.FILESET affects things versus SpecialArtifact.TREE and SpecialArtifact.RUNFILES is a question I cannot answer. But I do feel that "FILESET" should not be a special artifact type in Bazel.

For my goals, I need a way of producing a symlink tree as an "artifact" that can be transparently returned by a rule or passed to other actions. In my mind this looks as much like a tree artifact as possible (notably, "runfiles" do not look like a directory, but "fileset" artifacts do https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/actions/Artifact.java;l=682).

Maybe this is just a naming issue and it should be SpecialArtifact.SYMLINK_TREE? But then why is runfiles different?

lberki · 2024-10-28T16:00:09Z

My best educated guess is that SpecialArtifactType.FILESET and SpecialArtifactType.RUNFILES are mostly different now due to their history: there were implementations of two different legacy features inherited from the predecessor of Blaze (the maiden name of Bazel).

The difference used to be meaningful when Filesets were much more complicated than today, but with all your cleanups, I think they are semantically pretty close to each other now, except that SpecialArtifactType.RUNFILES are still special-cased in our source code (look at the call sites for ActionAnalysisMetadata.getActionType() to see where), but that's mostly for historical reasons and I'm busy eliminating that distinction.

Let's do this: I drive the cleanup to the end that removes all the special-casing of SpecialArtifactType.RUNFILES. It should take like two more changes, plus some waiting to see if the first one I'm submitting today causes any trouble. Once that is done, you can implement the Starlark interface to turn a Runfiles object into an artifact that is the materialized version of that runfiles tree. Then that can in turn be hopefully used to implement Filesets.

Deal?

lberki · 2024-10-28T16:03:47Z

Ideally, one would not make the distinction between tree artifacts and symlink trees either and instead be able to handle both symlinks and files in TreeArtifacts, but it's probably not the best decision to spend our energies on that right now because the unification with runfiles trees is enough for Filesets.

(I also have already done enough cleanups recently to not have the appetite for much more, but that's my personal issue)

sgowroji added type: feature request untriaged team-Build-Language labels May 13, 2022

aherrmann mentioned this issue Sep 2, 2022

Add rule to define hermetic POSIX toolchain tweag/rules_sh#34

Merged

fmeum mentioned this issue Oct 27, 2022

Non-runfile files from an executable get included in the input for ctx.actions.run() #16573

Closed

brandjon added team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts and removed team-Build-Language labels Nov 5, 2022

comius added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed untriaged labels Sep 14, 2023

This was referenced Sep 20, 2024

Update cargo_build_script to always render custom runfiles dirs. bazelbuild/rules_rust#2891

Merged

Cargo incompatible_runfiles_cargo_manifest_dir bazelbuild/rules_rust#2898

Open

rickeylev mentioned this issue Sep 24, 2024

Efficient way to relocate files in runfiles #23728

Open

tjgq mentioned this issue Oct 27, 2024

Please allow FilesToRunProvider to be constructed by Starlark #7187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow runfiles to be easily/efficiently passed as action inputs #15486

Allow runfiles to be easily/efficiently passed as action inputs #15486

rickeylev commented May 12, 2022

comius commented Sep 14, 2023

dgoldstein0 commented Jan 20, 2024

rickeylev commented Aug 14, 2024

dgoldstein0 commented Aug 24, 2024

rickeylev commented Aug 25, 2024

rickeylev commented Aug 29, 2024

dgoldstein0 commented Aug 30, 2024

rickeylev commented Oct 11, 2024

lberki commented Oct 12, 2024 •

edited

Loading

c-mita commented Oct 22, 2024

lberki commented Oct 22, 2024

comius commented Oct 22, 2024

lberki commented Oct 22, 2024

c-mita commented Oct 22, 2024

lberki commented Oct 22, 2024

c-mita commented Oct 28, 2024

lberki commented Oct 28, 2024 •

edited

Loading

lberki commented Oct 28, 2024

Allow runfiles to be easily/efficiently passed as action inputs #15486

Allow runfiles to be easily/efficiently passed as action inputs #15486

Comments

rickeylev commented May 12, 2022

Description of the feature request:

What underlying problem are you trying to solve with this feature?

Which operating system are you running Bazel on?

What is the output of bazel info release?

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

comius commented Sep 14, 2023

dgoldstein0 commented Jan 20, 2024

rickeylev commented Aug 14, 2024

dgoldstein0 commented Aug 24, 2024

rickeylev commented Aug 25, 2024

rickeylev commented Aug 29, 2024

dgoldstein0 commented Aug 30, 2024

rickeylev commented Oct 11, 2024

lberki commented Oct 12, 2024 • edited Loading

c-mita commented Oct 22, 2024

lberki commented Oct 22, 2024

comius commented Oct 22, 2024

lberki commented Oct 22, 2024

c-mita commented Oct 22, 2024

lberki commented Oct 22, 2024

c-mita commented Oct 28, 2024

lberki commented Oct 28, 2024 • edited Loading

lberki commented Oct 28, 2024

What is the output of `bazel info release`?

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

lberki commented Oct 12, 2024 •

edited

Loading

lberki commented Oct 28, 2024 •

edited

Loading