Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh_test does not work remotely on Windows #4962

Closed
jasharpe opened this issue Apr 3, 2018 · 12 comments
Closed

sh_test does not work remotely on Windows #4962

jasharpe opened this issue Apr 3, 2018 · 12 comments

Comments

@jasharpe
Copy link

jasharpe commented Apr 3, 2018

Description of the problem / feature request:

sh_test rules (and probably others) do not work on Windows when using remote execution, due to a missing runfiles manifest file.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

On a Windows Server 2016 VM with Bazel 0.11.1 installed:

  1. Build and run Local Remote Worker:
$ cd $home
$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ bazel build //src/tools/remote:worker
$ bazel-out/x64_windows-fastbuild/bin/src/tools/remote/worker.exe --listen_port=8080 --work_path=C:\tmp\lre --debug
  1. Create a Bazel directory with a simple shell test:
$ cd $home
$ mkdir bazel_test
$ cd bazel_test

Copy these files to there:
BUILD: https://gist.github.com/jasharpe/4566c496b222eaf45771ee815b1544e7
.bazelrc: https://gist.github.com/jasharpe/9edadeee13396b0efdddd6d21faeae3a
pass.sh: https://gist.github.com/jasharpe/70df48ca39a799825cc03099e9083086

  1. Demonstrate that it works locally:
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all :pass_test
  1. Demonstrate that it does not work remotely:
$ cd $home\bazel_test
$ bazel --bazelrc=.bazelrc test --test_output=all --config=remote :pass_test

The error printed is something like:

grep: /c/tmp/lre/build-e5526681-466d-4e6b-b5a4-a774ea7e3238/bazel-out/x64_windows-fastbuild/bin/pass_test.exe.runfiles/MANIFEST: No such file or directory
external/bazel_tools/tools/test/test-setup.sh: line 230: : command not found

Indeed if you check this directory, it does not contain a file called MANIFEST:

 ls C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles\


    Directory:
    C:\tmp\lre\build-e5526681-466d-4e6b-b5a4-a774ea7e3238\bazel-out\x64_windows-fastbuild\bin\pass_test.exe.runfiles


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         4/3/2018   6:24 PM                __main__

What operating system are you running Bazel on?

Windows Server 2016 (1709)

What's the output of bazel info release?

release 0.11.1

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

N/A

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

Closest thing seems #1593 and #2296 but I'm not sure it's the same problem, and both were closed a while ago.

Any other information, logs, or outputs that you want to share?

Replace these lines with your answer.

If the files are large, upload as attachment or provide link.

@jasharpe
Copy link
Author

jasharpe commented Apr 4, 2018

#3726 is pretty relevant as well, in particular "the manifest is not present in a sandbox", with reply "I'm not sure we have adequately defined / designed a more permanent solution, but I'm sure it isn't making the current runfiles manifests (that contain absolute paths) available in the sandbox."

@laszlocsomor
Copy link
Contributor

My repro attempt failed with this error:

C:\tmp\lre-test>bazel --bazelrc=.bazelrc test --test_output=all --config=remote :all
ERROR: Failed to init auth credentials: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
        at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
        at com.google.devtools.build.lib.events.Event.error(Event.java:165)
        at com.google.devtools.build.lib.events.Event.error(Event.java:200)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
        at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
        at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
        at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
        at com.google.devtools.build.lib.events.Event.<init>(Event.java:52)
        at com.google.devtools.build.lib.events.Event.error(Event.java:165)
        at com.google.devtools.build.lib.events.Event.error(Event.java:200)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:474)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:218)
        at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:58)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:851)
        at com.google.devtools.build.lib.server.GrpcServerImpl.access$2100(GrpcServerImpl.java:109)
        at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:916)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Server terminated abruptly (error code: 14, error message: '', log file: 'c:\users\laszlocsomor\appdata\local\temp\_bazel_laszlocsomor\fhlz3hz6/server/jvm.out')

The jvm.out file is empty.

I don't know what file I should set to GOOGLE_APPLICATION_CREDENTIALS.

@jasharpe
Copy link
Author

jasharpe commented Apr 4, 2018

Ah, sorry. This works when using a GCE VM. It may be sufficient to set --auth_enabled=false. This works for me and I'm hopeful it would resolve that problem. I've updated the gist for the .bazelrc to reflect this.

@jasharpe
Copy link
Author

jasharpe commented Apr 4, 2018

Update:

It turns out that remote execution for Windows provides quite a different environment. In fact, a
MANIFEST file is not required at all! I was able to make some hacky (but simple) changes to Bazel and get the sh_test above working.

Because of the nature of remote execution, all the runfiles are actually present in the runfiles tree on the remote worker (this is because you must ship actual files to remote workers on all platforms, so symlinks are resolved before packaging up the input tree). This contrasts to local execution of Windows, where the runfiles tree is empty, and the MANIFEST is used to find actual files.

This means that remote Windows execution looks a lot like remote Linux execution from the point of view of runfiles.

Two places are affected by this for Windows remote execution:

  1. For tests, in test-setup.sh, it is correct to take the !RUNFILES_MANIFEST_ONLY branch (that is, the first one) that has an implementation of rlocation using the TEST_SRCDIR. In particular for tests, this means that this setting of RUNFILES_MANIFEST_ONLY should be conditional on whether we're executing remotely.

    I'm not sure what the best solution is for making the setting of RUNFILES_MANIFEST_ONLY aware of whether execution is local or remote.

  2. For tests and non-tests, launchers need to be aware somehow of whether they are executing locally or remotely, and use or not use the manifest as appropriate.

    I believe that when executing remotely, the launcher implementation of rlocation should skip the manifest lookup, and instead just return basically $RUNFILES_DIR + "/" + path.

    I'm not sure what the best solution is for the launcher knowing whether or not it is executing locally or remotely.

I will prepare an example pull request containing my hacky change for illustration purposes.

@meteorcloudy
Copy link
Member

meteorcloudy commented Apr 4, 2018

Thanks for investigating!
The runfile library can already switch to a directory based runfile lookup if RUNFILES_MANIFEST_ONLY isn't set to 1
See

* <p>If {@code env} contains "RUNFILES_MANIFEST_ONLY" with value "1", this method returns a
* manifest-based implementation. The manifest's path is defined by the "RUNFILES_MANIFEST_FILE"
* key's value in {@code env}.
*
* <p>Otherwise this method returns a directory-based implementation. The directory's path is
* defined by the value in {@code env} under the "RUNFILES_DIR" key, or if absent, then under the
* "JAVA_RUNFILES" key.

But we do need to make change to the launcher

@laszlocsomor
Copy link
Contributor

Yun, I guess the launcher can use the runfiles library when it's ready. Unfortunately I'm finding insufficient time lately to finish anything, including the runfiles library.

@laszlocsomor
Copy link
Contributor

@jasharpe , the thanks --auth_enabled=false Bazel flag did indeed help.

@jasharpe
Copy link
Author

jasharpe commented Apr 5, 2018

I discussed with @meteorcloudy and I will try to make a pull request to (for now) change the launcher to attempt to fall back to a directory approach if the MANIFEST file is missing. I think this simplifies things since it'll just work in both environments without having to pass any state in. (For tests the launcher can check the setting of RUNFILES_MANIFEST_ONLY, but this doesn't work for other remotely run executables like tools for genrules. So the options are this falling back strategy, or adding some additional environment variables.)

Any objections to that?

The issue of not setting RUNFILES_MANIFEST_ONLY for remote tests I will handle separately.

As far as I know the cases I will need to handle for remote execution are:

  • Tests.
  • Data dependencies of tests.
  • Tools of genrules.

@jasharpe
Copy link
Author

jasharpe commented Apr 5, 2018

Since I promised to share my hacky change to make sh_tests work, here it is:

DISCLAIMER: Hacky!

jasharpe/bazel@master...jasharpe:hacky_runfiles

May be useful in understanding what issues I'm trying to deal with.

@jasharpe
Copy link
Author

jasharpe commented Apr 6, 2018

The repro case in the initial issue is for a test (sh_test in particular), but another interesting case for a fix to be tested on is genrules. I have created a repro case for this. Here's the BUILD file:

sh_binary(
    name = "tool",
    srcs = ["tool.sh"],
)

genrule(
    name = "genrule",
    srcs = [],
    outs = ["foo.txt"],
    cmd = "$(location :tool) > \"$@\"",
    tools = [":tool"],
)

tool.sh can just be an empty file.

If you build this genrule remotely you get:

LAUNCHER ERROR: Couldn't find MANIFEST file under C:/<redacted>/bazel-out/host/bin/tool.exe.runfiles\

@jasharpe
Copy link
Author

jasharpe commented Apr 6, 2018

One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/analysis/SourceManifestAction.java#L141

@laszlocsomor
Copy link
Contributor

laszlocsomor commented Apr 9, 2018

One more thing that satisfies my own curiosity, I just discovered that the reason manifests aren't written for remote builds is this:

// There is little gain to remoting these, since they include absolute path names inline.
return false;

The comment's reasoning makes sense though.

jasharpe pushed a commit to jasharpe/bazel that referenced this issue Apr 9, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address bazelbuild#4962.
jasharpe pushed a commit to jasharpe/bazel that referenced this issue Apr 9, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address bazelbuild#4962.
jasharpe pushed a commit to jasharpe/bazel that referenced this issue Apr 10, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address bazelbuild#4962.
jasharpe pushed a commit to jasharpe/bazel that referenced this issue Apr 10, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address bazelbuild#4962.
jasharpe pushed a commit to jasharpe/bazel that referenced this issue Apr 10, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address bazelbuild#4962.
bazel-io pushed a commit that referenced this issue Apr 11, 2018
… to validate this key use case for remote Windows builds.

This is a working test case, but I plan to add more in this style to demonstrate fixes to address #4962.

Note that to run this test requires a build of Bazel that includes b4545ba since this changed the way runfile manifests are discovered. Bazel 0.11 doesn't include this.

RELNOTES: None.
PiperOrigin-RevId: 192444770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants