py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

pcjanzen · 2020-08-22T22:21:35Z

Description of the problem / feature request:

Consider a Starlark-generated executable target, which depends at runtime on a py_binary by placing it in its data attr, and which locates the path to that py_binary using a runfiles library that uses runfiles_manifest files.

In this scenario, the target will execute the py_binary using the "real" path of the binary inside bazel-bin, not a path relative to the original target's runfiles.

Bazel doesn't necessarily build the py_binary's own runfiles tree, so the python stub, which only looks at its own sys.argv[0], can't find its module space.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Expand Python-binary-failure-example.zip then run bazel run //:exe

What operating system are you running Bazel on?

macosx

What's the output of `bazel info release`?

release 3.4.1

The text was updated successfully, but these errors were encountered:

pauldraper · 2022-01-28T17:21:50Z

so the python stub, which only looks at its own sys.argv[0], can't find its module space.

The usual fix is to require calling executables to set RUNFILES_DIR environment variable, and then use that if it is set.

rickeylev · 2022-01-28T19:03:52Z

A few observations.

$(rlocation) is returning a path that is outside of runfiles. $(rlocation workspace/inner_bin) is returning

/home/<user>/.cache/bazel/_bazel_<user>/<hash>/execroot/<workspace>/bazel-out/<config>/bin/inner_bin (I'm calling this the "canonical" path, since its the real file everything else symlinks to)

not e.g. /<snip>/outer_bin.runfiles/<workspace>/inner_bin. (when invoked like this, it works)

If outer_bin has a data dependency on inner_bin, then the official location (wrt outer bin) of inner_bin is $outer_bin_runfiles_dir/<workspace>/inner_bin. Sure, the one under outer bin's runfiles is a symlink, but this still seems problematic (also, this sort of thing could be viewed as a potential path-escape vulnerability, as you're forced to accept arbitrary paths that may not actually be in runfiles and shouldn't be accessible). FWIW, internally at Google, I think I've only ever seen the runfiles-relative paths used in e.g. shell programs, e.g. `$RUNFILES/inner_bin --whatever", and, similarly, internally, we always use a runfiles-relative path for accessing runfiles data.

Otherwise, to make this work, the runfiles directory of inner_bin needs to be materialized in the blaze-out directory along side the actual binary. When the canonical path is invoked directly, it's lost any context that it's within another binary's runfiles. This would probably be more correct anyways, as it would prevent the outer bin's runfiles from being accessible to the inner bin. That said, that might be a hard change because for the inner binary to find its correct runfiles, it'll have to find its canonical executable (e.g. realpath argv[0]).

(Note I'm talking about a single extra runfiles at the top level where the canonical bin executable is, not a runfiles tree within another runfiles tree (which could then have more runfiles trees within it))

Such a change seems like it is at odds with the existing guidance in this area, though? e.g. that "{outer}.runfiles/inner" is the official location, and that using the outer bin's runfiles for the inner's is the suggested fix/workaround.

The logic in the stub script has logic to match any *.runfiles src; I'm willing to bet that logic is there to make binaries-nested-in-binaries Just Work without additional envvar setup.

The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE, at least not during FindModuleSpace(), the function that looks for the runfiles directory. Some later logic does, but they don't appear to influence locating the runfiles for the stub script startup. That sounds like a bug; isn't the point of those envvars to explicitly tell the runfiles location? Making FindModuleSpace() also check those looks like a simple change. This seems like an obvious oversight, though, so it makes me wonder if there's something in the change history that might explain why it doesn't.

pcjanzen · 2022-01-28T20:39:48Z

Right. I tried to avoid assigning blame here, because this bug requires all of the following to be true:

Bazel emits the "canonical path" for inner_bin into the runfiles manifest for outer_bin.
Bazel does not materialize the canonical runfiles for an inner_bin that is a py_binary when building outer_bin. (Apparently, it does materialize it when outer_bin merges the py_binary's data_runfiles, instead of its default_runfiles, but the docs explicitly discourage this).
The runfiles resolution prefers the manifest to the runfiles_dir when both are present.
The python stub script does not honor $RUNFILES_DIR in FindModuleSpace().

and all of these probably make sense in some context.

The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE

If the stub script preferred $RUNFILES_MANIFEST_FILE over $RUNFILES_DIR, like current runfiles libraries do, then it would wind up in the same situation. Same with the RunfilesEnvvar function already in the stub.

rickeylev · 2022-01-29T17:45:40Z

Apparently, it does materialize it when outer_bin merges the py_binary's data_runfiles

Oh huh, yes, I didn't notice that. That is weird. They have the same contents (as printed from starlark); I don't see any difference. Additionally, if the contents of the data runfiles are copied into a new runfiles object[1], then it breaks again.

So it's almost like the original, Java created, data_runfiles Runfiles object is carrying some extra bit that later causes the runfiles dir to be created?

[1] like so, though I'll note only .files has any values in my repro (based upon the repro code in issue 14608)

    o_data = ctx.attr._tool[DefaultInfo].data_runfiles
    new_data = ctx.runfiles(
        transitive_files = o_data.files,
        symlinks = {v.path: v.target_file for v in o_data.symlinks.to_list()},
        root_symlinks = {v.path: v.target_file for v in o_data.root_symlinks.to_list()},
    )
    <merge new_data into final runfiles>

pcjanzen · 2022-01-29T18:11:43Z

If i'm reading this correctly, only the data_runfiles depend on this middleman: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/python/PyExecutable.java;l=87

rickeylev · 2022-01-29T18:13:35Z

Maybe this has something to do with the runfiles "middlemen". I was never able to really figure out what these middlemen things are, but the comments suggest something about tying together the owning executable and necessary files:

This appears related to the --experimental_build_transitive_python_runfiles flag. That flag defaults to true. if set to false, then using the original data_runfiles also results in a failure.

So, yeah, definitely seems like that bit of logic is the key piece.

pauldraper · 2022-01-29T20:53:51Z

The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE, at least not during FindModuleSpace()...That sounds like a bug; isn't the point of those envvars to explicitly tell the runfiles location? Making FindModuleSpace() also check those looks like a simple change. This seems like an obvious oversight, though, so it makes me wonder if there's something in the change history that might explain why it doesn't.

Agreed.

When invoking a py_binary() through an sh_binary() using the Bash runfiles library, the location of the py_binary() will be resolved from the runfiles manifest file. This means that the argv[0] of the Python stub script will not point to a location under the runfiles directory of the shell script, but to a location in Bazel's execroot. Normally this does not lead to any issues, as argv[0] + ".runfiles" happens to point to be a valid runfiles directory as well. It does become problematic when --nobuild_runfile_links is provided, as in that case only the outer shell script is guaranteed to have a runfiles directory; not the inner Python script. This change extends the Python stub template to also consider RUNFILES_DIR when no runfiles directory can be found. Even though it's not technically correct, we also attempt to derive the runfiles directory from RUNFILES_MANIFEST_FILE. I suspect that this is a necessity as long as py_binary()s cannot operate purely using a manifest file. They currently depend on a concrete instantiation of the runfiles directory. Fixes: bazelbuild#11997

rickeylev · 2022-05-09T20:06:08Z

Maybe this has something to do with the runfiles "middlemen".

Well, I found some more info about this and it seems in the same vein as this bug.

Basically, there is something special about that middleman object that causes an inner binary's runfiles to be updated (on disk) when an outer binary is built. Back in March 2015, lberki attempted to remove the middleman from the data_runfiles. It ended up causing a breakage, so it was added back a couple months later. At some later point, the --experimental_build_transitive_python_runfiles flag was introduced (in a seemingly unrelated CL), and then later it became a guard for adding the middleman to data_runfiles (the change history for this part is a bit murky). On the plus side, I think I now understand what that flag name is trying to communicate: whether the runfiles directory of a transitive binary itself should also be updated.

The breakage in question was pretty obscure: people were building an inner binary, modifying it (e.g. changing deps) (but not rebuilding it), then building an outer binary, and expecting that the outer binary's invocation of the inner binary would work (as if you had re-built inner as well). e.g.

# outer.py basically just does subprocess("$runfiles_dir/inner")
py_binary(outer, srcs=[outer.py], data=[inner])
# inner.py basically just does "import old.py"
py_binary(inner, srcs=[inner.py, old.py])

# build :inner -> after this, inner.runfiles has "old.py" in it

# replace "old.py" with "new.py"; change inner.py to "import new" instead
# build :outer -> after this, outer.runfiles has new.py in it due to runfiles merging
blaze-bin/outer

This would work because, when the outer binary built the inner binary, the middleman object causes the inner's runfiles directory from the first build (where only inner was built) to also be updated (i.e., new.py shows up). Without the middleman, then the inner's runfiles directory isn't updated (i.e new.py doesn't show up) (but the merged runfiles of the outer's is) when outer is built.

This particular use case failing might be specific to our Google build, but haven't checked. Google's python_stub_template.txt will look for a runfiles directory after resolving symlinks, while Bazel's looks for a runfiles directory before resolving symlinks. I think the net effect means, with Bazel, it see's the "outer.runfiles" in the path and uses that, while with Google's build, it "escapes" outside of "outer.runfiles" and then starts looking at the inner.runfiles directory directly.

When invoking a py_binary() through an sh_binary() using the Bash runfiles library, the location of the py_binary() will be resolved from the runfiles manifest file. This means that the argv[0] of the Python stub script will not point to a location under the runfiles directory of the shell script, but to a location in Bazel's execroot. Normally this does not lead to any issues, as argv[0] + ".runfiles" happens to point to be a valid runfiles directory as well. It does become problematic when --nobuild_runfile_links is provided, as in that case only the outer shell script is guaranteed to have a runfiles directory; not the inner Python script. This change extends the Python stub template to also consider RUNFILES_DIR when no runfiles directory can be found. Even though it's not technically correct, we also attempt to derive the runfiles directory from RUNFILES_MANIFEST_FILE. I suspect that this is a necessity as long as py_binary()s cannot operate purely using a manifest file. They currently depend on a concrete instantiation of the runfiles directory. Fixes: bazelbuild#11997

When invoking a py_binary() through an sh_binary() using the Bash runfiles library, the location of the py_binary() will be resolved from the runfiles manifest file. This means that the argv[0] of the Python stub script will not point to a location under the runfiles directory of the shell script, but to a location in Bazel's execroot. Normally this does not lead to any issues, as argv[0] + ".runfiles" happens to point to be a valid runfiles directory as well. It does become problematic when --nobuild_runfile_links is provided, as in that case only the outer shell script is guaranteed to have a runfiles directory; not the inner Python script. This change extends the Python stub template to also consider RUNFILES_DIR when no runfiles directory can be found. Even though it's not technically correct, we also attempt to derive the runfiles directory from RUNFILES_MANIFEST_FILE. I suspect that this is a necessity as long as py_binary()s cannot operate purely using a manifest file. They currently depend on a concrete instantiation of the runfiles directory. Fixes: bazelbuild#11997 Change-Id: Ib559784c1b043deba1dbd329bedbc9250acfd44e

When invoking a py_binary() through an sh_binary() using the Bash runfiles library, the location of the py_binary() will be resolved from the runfiles manifest file. This means that the argv[0] of the Python stub script will not point to a location under the runfiles directory of the shell script, but to a location in Bazel's execroot. Normally this does not lead to any issues, as argv[0] + ".runfiles" happens to point to be a valid runfiles directory as well. It does become problematic when --nobuild_runfile_links is provided, as in that case only the outer shell script is guaranteed to have a runfiles directory; not the inner Python script. This change extends the Python stub template to also consider RUNFILES_DIR when no runfiles directory can be found. Even though it's not technically correct, we also attempt to derive the runfiles directory from RUNFILES_MANIFEST_FILE. I suspect that this is a necessity as long as py_binary()s cannot operate purely using a manifest file. They currently depend on a concrete instantiation of the runfiles directory. Fixes: bazelbuild#11997

When invoking a py_binary() through an sh_binary() using the Bash runfiles library, the location of the py_binary() will be resolved from the runfiles manifest file. This means that the argv[0] of the Python stub script will not point to a location under the runfiles directory of the shell script, but to a location in Bazel's execroot. Normally this does not lead to any issues, as argv[0] + ".runfiles" happens to point to be a valid runfiles directory as well. It does become problematic when --nobuild_runfile_links is provided, as in that case only the outer shell script is guaranteed to have a runfiles directory; not the inner Python script. This change extends the Python stub template to also consider RUNFILES_DIR when no runfiles directory can be found. Even though it's not technically correct, we also attempt to derive the runfiles directory from RUNFILES_MANIFEST_FILE. I suspect that this is a necessity as long as py_binary()s cannot operate purely using a manifest file. They currently depend on a concrete instantiation of the runfiles directory. Fixes: bazelbuild#11997 Closes bazelbuild#14740. PiperOrigin-RevId: 478857199 Change-Id: I8cc6ea014bfd4b9ea2f1672e8e814ba38a5bf471

oquenchil added team-Rules-Python Native rules for Python untriaged type: bug labels Aug 27, 2020

lberki added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed team-Rules-Python Native rules for Python untriaged labels Nov 18, 2020

jin added the team-Rules-Python Native rules for Python label Dec 4, 2020

pcjanzen mentioned this issue Feb 15, 2021

py_binary target inside another Starlark rule's runfiles doesn't get built fully #378

Closed

pcjanzen mentioned this issue Jan 28, 2022

py_binary doesn't work with default_runfiles #14608

Closed

EdSchouten mentioned this issue Feb 7, 2022

Let Python stubs respect RUNFILES_* while finding the module space #14740

Closed

pcjanzen mentioned this issue Sep 19, 2022

incompatible_build_transitive_python_runfiles #16303

Closed

copybara-service bot closed this as completed in c3425fe Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

pcjanzen commented Aug 22, 2020

pauldraper commented Jan 28, 2022

rickeylev commented Jan 28, 2022

pcjanzen commented Jan 28, 2022

rickeylev commented Jan 29, 2022

pcjanzen commented Jan 29, 2022

rickeylev commented Jan 29, 2022

pauldraper commented Jan 29, 2022

rickeylev commented May 9, 2022 •

edited

Loading

py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

Comments

pcjanzen commented Aug 22, 2020

Description of the problem / feature request:

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

pauldraper commented Jan 28, 2022

rickeylev commented Jan 28, 2022

pcjanzen commented Jan 28, 2022

rickeylev commented Jan 29, 2022

pcjanzen commented Jan 29, 2022

rickeylev commented Jan 29, 2022

pauldraper commented Jan 29, 2022

rickeylev commented May 9, 2022 • edited Loading

What's the output of `bazel info release`?

rickeylev commented May 9, 2022 •

edited

Loading