Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py_binary located via manifest-based runfiles lib fails when its own runfiles tree isn't built #11997

Closed
pcjanzen opened this issue Aug 22, 2020 · 8 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-Python Native rules for Python type: bug

Comments

@pcjanzen
Copy link
Contributor

Description of the problem / feature request:

Consider a Starlark-generated executable target, which depends at runtime on a py_binary by placing it in its data attr, and which locates the path to that py_binary using a runfiles library that uses runfiles_manifest files.

In this scenario, the target will execute the py_binary using the "real" path of the binary inside bazel-bin, not a path relative to the original target's runfiles.

Bazel doesn't necessarily build the py_binary's own runfiles tree, so the python stub, which only looks at its own sys.argv[0], can't find its module space.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Expand Python-binary-failure-example.zip then run bazel run //:exe

What operating system are you running Bazel on?

macosx

What's the output of bazel info release?

release 3.4.1

@lberki lberki added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed team-Rules-Python Native rules for Python untriaged labels Nov 18, 2020
@jin jin added the team-Rules-Python Native rules for Python label Dec 4, 2020
@pauldraper
Copy link
Contributor

so the python stub, which only looks at its own sys.argv[0], can't find its module space.

The usual fix is to require calling executables to set RUNFILES_DIR environment variable, and then use that if it is set.

@rickeylev
Copy link
Contributor

A few observations.

$(rlocation) is returning a path that is outside of runfiles. $(rlocation workspace/inner_bin) is returning

/home/<user>/.cache/bazel/_bazel_<user>/<hash>/execroot/<workspace>/bazel-out/<config>/bin/inner_bin (I'm calling this the "canonical" path, since its the real file everything else symlinks to)

not e.g. /<snip>/outer_bin.runfiles/<workspace>/inner_bin. (when invoked like this, it works)

If outer_bin has a data dependency on inner_bin, then the official location (wrt outer bin) of inner_bin is $outer_bin_runfiles_dir/<workspace>/inner_bin. Sure, the one under outer bin's runfiles is a symlink, but this still seems problematic (also, this sort of thing could be viewed as a potential path-escape vulnerability, as you're forced to accept arbitrary paths that may not actually be in runfiles and shouldn't be accessible). FWIW, internally at Google, I think I've only ever seen the runfiles-relative paths used in e.g. shell programs, e.g. `$RUNFILES/inner_bin --whatever", and, similarly, internally, we always use a runfiles-relative path for accessing runfiles data.

Otherwise, to make this work, the runfiles directory of inner_bin needs to be materialized in the blaze-out directory along side the actual binary. When the canonical path is invoked directly, it's lost any context that it's within another binary's runfiles. This would probably be more correct anyways, as it would prevent the outer bin's runfiles from being accessible to the inner bin. That said, that might be a hard change because for the inner binary to find its correct runfiles, it'll have to find its canonical executable (e.g. realpath argv[0]).

(Note I'm talking about a single extra runfiles at the top level where the canonical bin executable is, not a runfiles tree within another runfiles tree (which could then have more runfiles trees within it))

Such a change seems like it is at odds with the existing guidance in this area, though? e.g. that "{outer}.runfiles/inner" is the official location, and that using the outer bin's runfiles for the inner's is the suggested fix/workaround.


The logic in the stub script has logic to match any *.runfiles src; I'm willing to bet that logic is there to make binaries-nested-in-binaries Just Work without additional envvar setup.


The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE, at least not during FindModuleSpace(), the function that looks for the runfiles directory. Some later logic does, but they don't appear to influence locating the runfiles for the stub script startup. That sounds like a bug; isn't the point of those envvars to explicitly tell the runfiles location? Making FindModuleSpace() also check those looks like a simple change. This seems like an obvious oversight, though, so it makes me wonder if there's something in the change history that might explain why it doesn't.

@pcjanzen
Copy link
Contributor Author

Right. I tried to avoid assigning blame here, because this bug requires all of the following to be true:

  • Bazel emits the "canonical path" for inner_bin into the runfiles manifest for outer_bin.
  • Bazel does not materialize the canonical runfiles for an inner_bin that is a py_binary when building outer_bin. (Apparently, it does materialize it when outer_bin merges the py_binary's data_runfiles, instead of its default_runfiles, but the docs explicitly discourage this).
  • The runfiles resolution prefers the manifest to the runfiles_dir when both are present.
  • The python stub script does not honor $RUNFILES_DIR in FindModuleSpace().

and all of these probably make sense in some context.

The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE

If the stub script preferred $RUNFILES_MANIFEST_FILE over $RUNFILES_DIR, like current runfiles libraries do, then it would wind up in the same situation. Same with the RunfilesEnvvar function already in the stub.

@rickeylev
Copy link
Contributor

Apparently, it does materialize it when outer_bin merges the py_binary's data_runfiles

Oh huh, yes, I didn't notice that. That is weird. They have the same contents (as printed from starlark); I don't see any difference. Additionally, if the contents of the data runfiles are copied into a new runfiles object[1], then it breaks again.

So it's almost like the original, Java created, data_runfiles Runfiles object is carrying some extra bit that later causes the runfiles dir to be created?

[1] like so, though I'll note only .files has any values in my repro (based upon the repro code in issue 14608)

    o_data = ctx.attr._tool[DefaultInfo].data_runfiles
    new_data = ctx.runfiles(
        transitive_files = o_data.files,
        symlinks = {v.path: v.target_file for v in o_data.symlinks.to_list()},
        root_symlinks = {v.path: v.target_file for v in o_data.root_symlinks.to_list()},
    )
    <merge new_data into final runfiles>

@pcjanzen
Copy link
Contributor Author

If i'm reading this correctly, only the data_runfiles depend on this middleman: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/rules/python/PyExecutable.java;l=87

@rickeylev
Copy link
Contributor

Maybe this has something to do with the runfiles "middlemen". I was never able to really figure out what these middlemen things are, but the comments suggest something about tying together the owning executable and necessary files:

This appears related to the --experimental_build_transitive_python_runfiles flag. That flag defaults to true. if set to false, then using the original data_runfiles also results in a failure.

So, yeah, definitely seems like that bit of logic is the key piece.

@pauldraper
Copy link
Contributor

The stub script doesn't appear to care about RUNFILES_DIR or RUNFILES_MANIFEST_FILE, at least not during FindModuleSpace()...That sounds like a bug; isn't the point of those envvars to explicitly tell the runfiles location? Making FindModuleSpace() also check those looks like a simple change. This seems like an obvious oversight, though, so it makes me wonder if there's something in the change history that might explain why it doesn't.

Agreed.

EdSchouten added a commit to EdSchouten/bazel that referenced this issue Feb 7, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Feb 7, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Feb 7, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Feb 7, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Feb 7, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
@rickeylev
Copy link
Contributor

rickeylev commented May 9, 2022

Maybe this has something to do with the runfiles "middlemen".

Well, I found some more info about this and it seems in the same vein as this bug.

Basically, there is something special about that middleman object that causes an inner binary's runfiles to be updated (on disk) when an outer binary is built. Back in March 2015, lberki attempted to remove the middleman from the data_runfiles. It ended up causing a breakage, so it was added back a couple months later. At some later point, the --experimental_build_transitive_python_runfiles flag was introduced (in a seemingly unrelated CL), and then later it became a guard for adding the middleman to data_runfiles (the change history for this part is a bit murky). On the plus side, I think I now understand what that flag name is trying to communicate: whether the runfiles directory of a transitive binary itself should also be updated.

The breakage in question was pretty obscure: people were building an inner binary, modifying it (e.g. changing deps) (but not rebuilding it), then building an outer binary, and expecting that the outer binary's invocation of the inner binary would work (as if you had re-built inner as well). e.g.

# outer.py basically just does subprocess("$runfiles_dir/inner")
py_binary(outer, srcs=[outer.py], data=[inner])
# inner.py basically just does "import old.py"
py_binary(inner, srcs=[inner.py, old.py])

# build :inner -> after this, inner.runfiles has "old.py" in it

# replace "old.py" with "new.py"; change inner.py to "import new" instead
# build :outer -> after this, outer.runfiles has new.py in it due to runfiles merging
blaze-bin/outer

This would work because, when the outer binary built the inner binary, the middleman object causes the inner's runfiles directory from the first build (where only inner was built) to also be updated (i.e., new.py shows up). Without the middleman, then the inner's runfiles directory isn't updated (i.e new.py doesn't show up) (but the merged runfiles of the outer's is) when outer is built.

This particular use case failing might be specific to our Google build, but haven't checked. Google's python_stub_template.txt will look for a runfiles directory after resolving symlinks, while Bazel's looks for a runfiles directory before resolving symlinks. I think the net effect means, with Bazel, it see's the "outer.runfiles" in the path and uses that, while with Google's build, it "escapes" outside of "outer.runfiles" and then starts looking at the inner.runfiles directory directly.

EdSchouten added a commit to EdSchouten/bazel that referenced this issue Aug 11, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Aug 12, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
Change-Id: Ib559784c1b043deba1dbd329bedbc9250acfd44e
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Aug 22, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
Change-Id: Ib559784c1b043deba1dbd329bedbc9250acfd44e
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Sep 26, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Sep 26, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
EdSchouten added a commit to EdSchouten/bazel that referenced this issue Sep 26, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997
aiuto pushed a commit to aiuto/bazel that referenced this issue Oct 12, 2022
When invoking a py_binary() through an sh_binary() using the Bash
runfiles library, the location of the py_binary() will be resolved from
the runfiles manifest file. This means that the argv[0] of the Python
stub script will not point to a location under the runfiles directory of
the shell script, but to a location in Bazel's execroot.

Normally this does not lead to any issues, as argv[0] + ".runfiles"
happens to point to be a valid runfiles directory as well. It does
become problematic when --nobuild_runfile_links is provided, as in that
case only the outer shell script is guaranteed to have a runfiles
directory; not the inner Python script.

This change extends the Python stub template to also consider
RUNFILES_DIR when no runfiles directory can be found. Even though it's
not technically correct, we also attempt to derive the runfiles
directory from RUNFILES_MANIFEST_FILE. I suspect that this is a
necessity as long as py_binary()s cannot operate purely using a manifest
file. They currently depend on a concrete instantiation of the runfiles
directory.

Fixes: bazelbuild#11997

Closes bazelbuild#14740.

PiperOrigin-RevId: 478857199
Change-Id: I8cc6ea014bfd4b9ea2f1672e8e814ba38a5bf471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-Python Native rules for Python type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants