Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Flipping --incompatible_auto_configure_host_platform flag causes "failed to delete output files before executing action" error #9104

Closed
meteorcloudy opened this issue Aug 7, 2019 · 24 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@meteorcloudy
Copy link
Member

Related #7081

To reproduce:

git clone https://github.com/bazelbuild/rules_nodejs.git
cd rules_nodejs
bazel build //e2e:e2e_karma
...
bazel build --incompatible_auto_configure_host_platform //e2e:e2e_karma
ERROR: C:/users/pcloudy/_bazel_pcloudy/owxwy7oq/external/build_bazel_rules_typescript/internal/BUILD.bazel:80:1: failed to delete output files before executing action
Target //e2e:e2e_karma failed to build
ERROR: C:/tools/msys64/home/pcloudy/workspace/rules_nodejs/packages/karma/BUILD.bazel:38:1 failed to delete output files before executing action

This affects project like rules_nodejs where worker is used during the build. If I manually kill the worker after the first build, the second build with the flag won't fail to delete the outputs.

@katre

@meteorcloudy meteorcloudy added type: bug P1 I'll work on this now. (Assignee required) team-Configurability platforms, toolchains, cquery, select(), config transitions labels Aug 7, 2019
@meteorcloudy meteorcloudy changed the title Windows: Flipping --incompatible_auto_configure_host_platform cause flag causes "failed to delete output files before executing action" erro Windows: Flipping --incompatible_auto_configure_host_platform flag causes "failed to delete output files before executing action" erro Aug 7, 2019
@meteorcloudy meteorcloudy changed the title Windows: Flipping --incompatible_auto_configure_host_platform flag causes "failed to delete output files before executing action" erro Windows: Flipping --incompatible_auto_configure_host_platform flag causes "failed to delete output files before executing action" error Aug 7, 2019
@meteorcloudy
Copy link
Member Author

meteorcloudy commented Aug 7, 2019

I also noticed if you remove the flag again, the build will still fail to delete outputs. This is worth investigating.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Aug 7, 2019
@katre
Copy link
Member

katre commented Aug 7, 2019

Huh, that's a really unexpected error from changing the host platform. I'll try to investigate. Is there a windows VM configured that I can use to reproduce?

@laszlocsomor
Copy link
Contributor

@katre : https://docs.google.com/document/d/1sEa9cGcLwr0aYGNrhGy8rrNxtLjEzKhh0FGcDlm0ymU/edit

@katre
Copy link
Member

katre commented Aug 7, 2019

This is really weird: flipping the flag causes the failure, regardless of which way it is flipped.

bazel clean
bazel build  //e2e:e2e_karma --incompatible_auto_configure_host_platform
# Succeeds
bazel build  //e2e:e2e_karma --noincompatible_auto_configure_host_platform
# Fails

And...

bazel clean
bazel build  //e2e:e2e_karma --noincompatible_auto_configure_host_platform
# Succeeds
bazel build  //e2e:e2e_karma --incompatible_auto_configure_host_platform
# Fails

@katre katre self-assigned this Aug 8, 2019
@katre
Copy link
Member

katre commented Aug 8, 2019

Finally managed a clean debug, here is the error:

  exception: java.io.IOException: C:/users/jcater/_bazel_jcater/ayo6twso/execroot/build_bazel_rules_nodejs/bazel-out/host/bin/external/build_bazel_rules_typescript/internal/tsc_wrapped_bin.exe (Permission denied)
  action: action 'Writing script external/build_bazel_rules_typescript/internal/tsc_wrapped_bin.exe [for host]' (FileWrite[[File:[C:/users/jcater/_bazel_jcater/ayo6twso[source]]external/bazel_tools/tools/launcher/launcher.exe] -> [File:[[<execution_root>]bazel-out/host/bin]external/build_bazel_rules_typescript/internal/tsc_wrapped_bin.exe]])

So it appears that changing this flag is tickling a permissions error. Possibly the worker still being active means the file can't be deleted?

When we discard the analysis cache, should workers also be shut down?

@meteorcloudy
Copy link
Member Author

I guess that's exactly the problem, we don't shutdown the workers after discarding the analysis cache. @alexeagle @gregmagolan How does the worker get started and what could we do to shutdown the workers?

@katre
Copy link
Member

katre commented Aug 12, 2019

I would like to submit the flag flip for `--incompatible_auto_configure_host_platform`` (#7081). Should this issue block that?

@katre
Copy link
Member

katre commented Aug 12, 2019

I don't think the flag causes the error, I think it's just one way to cause re-running the same action, which is what triggers it.

@meteorcloudy
Copy link
Member Author

I'm if you can go ahead and flip the flag. But you should do a downstream test for all projects with this flag flipped.

@katre
Copy link
Member

katre commented Aug 13, 2019

This error is not happening in the latest downstream build: https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/1144

(I'm not sure what the error is with res_test, however).

@katre katre closed this as completed Aug 13, 2019
@meteorcloudy
Copy link
Member Author

I did a bisect, it turned out the failure of res_test is indeed caused by flipping --incompatible_auto_configure_host_platform. I supposed it affected how windows sdk toolchain was selected. Maybe we should fix from the test side. /cc @laszlocsomor can you take a look?

@katre
Copy link
Member

katre commented Aug 14, 2019

Thank you for debugging. I was confused because that didn't fail in the actual CI run, only the downstream.

@laszlocsomor
Copy link
Contributor

Yes, I'll take a look.

@meteorcloudy
Copy link
Member Author

Yes, because that's not a shell test, it's somehow affected by the Bazel binary that runs the test.

@meteorcloudy
Copy link
Member Author

@laszlocsomor Thanks!

@laszlocsomor
Copy link
Contributor

confusion intensifies

Without the flag:

c:\src\bazel-releases\0.29.0\rc6\bazel.exe --ignore_all_rc_files build //src/test/res:res --noincompatible_auto_configure_host_platform -s --toolchain_resolution_debug
(...)
INFO: Build options --incompatible_auto_configure_host_platform and --platforms have changed, discarding analysis cache.
INFO: ToolchainResolution: Looking for toolchain of type @io_bazel//src/main/res:toolchain_type...
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_arm64_tc...
INFO: ToolchainResolution:     Toolchain constraint @platforms//cpu:cpu has value @platforms//cpu:aarch64, which does not match value @platforms//cpu:x86_64 from the target platform @bazel_tools//platforms:target_platform
INFO: ToolchainResolution:   Rejected toolchain @local_config_winsdk//:local_arm64_tc, because of target platform mismatch
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_x64_tc...
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_x86_tc...
INFO: ToolchainResolution:     Toolchain constraint @platforms//cpu:cpu has value @platforms//cpu:x86_32, which does not match value @platforms//cpu:x86_64 from the target platform @bazel_tools//platforms:target_platform
INFO: ToolchainResolution:   Rejected toolchain @local_config_winsdk//:local_x86_tc, because of target platform mismatch
INFO: ToolchainResolution:   Considering toolchain //src/main/res:empty...
INFO: ToolchainResolution:   For toolchain type @io_bazel//src/main/res:toolchain_type, possible execution platforms and toolchains: {//:default_host_platform -> @local_config_winsdk//:local_x64_tc, @bazel_tools//platforms:host_platform -> @local_config_winsdk//:local_x64_tc}
INFO: ToolchainResolution: Selected execution platform //:default_host_platform, type @io_bazel//src/main/res:toolchain_type -> toolchain @local_config_winsdk//:local_x64_tc
INFO: ToolchainResolution: Selected execution platform //:default_host_platform,
INFO: ToolchainResolution: Selected execution platform //:default_host_platform,
INFO: Analyzed target //src/test/res:res (3 packages loaded, 77 targets configured).

With the flag:

c:\src\bazel-releases\0.29.0\rc6\bazel.exe --ignore_all_rc_files build //src/test/res:res --incompatible_auto_configure_host_platform -s --toolchain_resolution_debug
(...)
INFO: Build options --incompatible_auto_configure_host_platform and --platforms have changed, discarding analysis cache.
INFO: ToolchainResolution: Looking for toolchain of type @io_bazel//src/main/res:toolchain_type...
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_arm64_tc...
INFO: ToolchainResolution:     Toolchain constraint @platforms//cpu:cpu has value @platforms//cpu:aarch64, which does not match value @platforms//cpu:x86_64 from the target platform @local_config_platform//:host
INFO: ToolchainResolution:   Rejected toolchain @local_config_winsdk//:local_arm64_tc, because of target platform mismatch
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_x64_tc...
INFO: ToolchainResolution:     Toolchain constraint @platforms//os:os has value @platforms//os:windows, which does not match value <missing> from the execution platform //:default_host_platform
INFO: ToolchainResolution:     Toolchain constraint @platforms//cpu:cpu has value @platforms//cpu:x86_64, which does not match value <missing> from the execution platform //:default_host_platform
INFO: ToolchainResolution:   Considering toolchain @local_config_winsdk//:local_x86_tc...
INFO: ToolchainResolution:     Toolchain constraint @platforms//cpu:cpu has value @platforms//cpu:x86_32, which does not match value @platforms//cpu:x86_64 from the target platform @local_config_platform//:host
INFO: ToolchainResolution:   Rejected toolchain @local_config_winsdk//:local_x86_tc, because of target platform mismatch
INFO: ToolchainResolution:   Considering toolchain //src/main/res:empty...
INFO: ToolchainResolution:   For toolchain type @io_bazel//src/main/res:toolchain_type, possible execution platforms and toolchains: {@local_config_platform//:host -> @local_config_winsdk//:local_x64_tc, //:default_host_platform -> //src/main/res:empty}
INFO: ToolchainResolution: Selected execution platform //:default_host_platform, type @io_bazel//src/main/res:toolchain_type -> toolchain //src/main/res:empty
INFO: ToolchainResolution: Selected execution platform //:default_host_platform,
INFO: ToolchainResolution: Selected execution platform //:default_host_platform,
INFO: Analyzed target //src/test/res:res (0 packages loaded, 74 targets configured).

Confusing bit:

... which does not match value <missing> from the execution platform

@katre
Copy link
Member

katre commented Aug 14, 2019

I think this is because //:default_host_platform is inheriting from the old @bazel_tools//platforms:host_platform. Previously, this had the cpu and os constraints magically added based on the --hpst_cpu flag, but with the incompatible change they are not set, and so //:default_host_platform also has them unset.

I'll send a PR shortly to fix default_host_platform to inherit correctly. Will a normal CI run tell me if the issue is fixed?

@katre
Copy link
Member

katre commented Aug 14, 2019

Right, I can't update default_host_platform: see #9148 for details.

@meteorcloudy
Copy link
Member Author

OK, looks like we should do the same for res_test?

@katre
Copy link
Member

katre commented Aug 14, 2019 via email

@laszlocsomor
Copy link
Contributor

laszlocsomor commented Aug 14, 2019

Sorry, I don't follow.
#9148 says:

However, that currently fails because in Bazel 0.28.1, @local_config_platform//:host was missing public visibility. This was added in ff81b0c, which is included in Bazel 0.29.0.

So can't we fix the test in 0.29.0?

OK, looks like we should do the same for res_test?

Sorry again, do what?

@meteorcloudy
Copy link
Member Author

I suppose if we want to fix the tests (//src/tools/singlejar:combiners_test and //src/test/res:res_test) after enabling --incompatible_auto_configure_host_platform, we'll have to break them with 0.28.1? Therefore we should disable the tests first, wait for the release of 0.29.0, then enable them with the fix. @katre Did I understand correctly?

@laszlocsomor
Copy link
Contributor

Thanks @meteorcloudy ! I'm fine with that.

bazel-io pushed a commit that referenced this issue Aug 14, 2019
#9104 (comment)

RELNOTES: None
PiperOrigin-RevId: 263339618
@katre
Copy link
Member

katre commented Aug 14, 2019

Thanks! I'll take care of cleaning up this (and combiners_test) after the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

No branches or pull requests

3 participants