Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross compiled Go (and possibly other languages), fully rebuilds when unrelated skylib flags change #13317

Closed
mishas opened this issue Apr 8, 2021 · 12 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) Starlark configuration Starlark transitions, build_settings team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@mishas
Copy link
Contributor

mishas commented Apr 8, 2021

Description of the problem / feature request:

When adding a flag (e.g. string_flag rule) from skylib, and passing that flag to a build for a Go binary that has goos and goarch set for it (even if not cross-compiling) - everything fully rebuilds (including host-run tools, like protoc).
This happens even if that flag is not used anywhere (see example below).

This happens without protoc as well, but I'm using it in the example, as something that should not even be cross-compiled that recompiles.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Minimal example can be found here: https://github.com/mishas/bazel-cross-compile-rebuild
First run: bazel build //pkg/demo --//pkg/demo:myflag=a
then run: bazel build //pkg/demo --//pkg/demo:myflag=b
You'll see that it recompiles everything (including protoc compiler) when it should really not compile anything.

main BUILD file for completness:

load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library")
load("@bazel_skylib//rules:common_settings.bzl", "string_flag")

# Example copied from https://docs.bazel.build/versions/master/skylark/config.html#predefined-settings
# Please note that this rule is not used anywhere.
string_flag(
    name = "myflag",
    build_setting_default = "a",
    values = ["a", "b", "c"],
)

go_library(
    ...  # removing for simplicity. Generated with Gazelle
    deps = ["//pkg/pb"],  # depending on some protobuf. The BUILD file in //pb:BUILD is gazelle generated.
)

go_binary(
    ...  # removing for simplicity. Generated with Gazelle
    goarch = "amd64",
    goos = "linux",  # The problem happens even if I set this to "darwin", which should NOT cross-compile.
)

What operating system are you running Bazel on?

I tried it both on Mac (Big Sur, 11.1) and Linux (Debian10).
It recompiles on both, even when not cross-compiling.

What's the output of bazel info release?

release 4.0.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

Not relevant

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Not relevant

Have you found anything relevant by searching the web?

I also opened a bug for rules_go (as I thought it's a bug there), but got an answer from @jayconrod:

This sounds like a Bazel bug, especially since protoc is rebuilt. Have you asked in bazel-discuss or opened an issue in bazelbuild/bazel

Any other information, logs, or outputs that you want to share?

The same doesn't happen, if I use --define myflag=<whatever> - in the case of the (deprecated) --define flag, only analysis runs twice, not the build.

@jin jin added Starlark configuration Starlark transitions, build_settings team-Configurability platforms, toolchains, cquery, select(), config transitions untriaged labels Apr 14, 2021
@gregestren
Copy link
Contributor

gregestren commented Apr 14, 2021

Thanks for the clear reproduction, @mishas. I tried to minimize it at https://gist.github.com/gregestren/5c999b7cafeec3718a9afbf4fe17d2bf.

I think #12171 is relevant, but that's a lot to wade through.

The relevant logic is

* Compute the output directory name fragment corresponding to the new BuildOptions based on (1)
* the names and values of all native options previously transitioned anywhere in the build by
* starlark options, (2) names and values of all entries in the starlark options map.
*
* @param changedOptions the names of all options changed by this transition in label form e.g.
* "//command_line_option:cpu" for native options and "//myapp:foo" for starlark options.
* @param optionInfoMap a map of all native options (name -> OptionInfo) present in {@code
* toOptions}.
* @param toOptions the newly transitioned {@link BuildOptions} for which we need to updated
* {@code transitionDirectoryNameFragment} and {@code affectedByStarlarkTransition}.
*/
// TODO(bazel-team): This hashes different forms of equivalent values differently though they
// should be the same configuration. Starlark transitions are flexible about the values they
// take (e.g. bool-typed options can take 0/1, True/False, "0"/"1", or "True"/"False") which
// makes it so that two configurations that are the same in value may hash differently.
private static void updateOutputDirectoryNameFragment(
and
// hash all starlark options in map.
toOptions.getStarlarkOptions().forEach((opt, value) -> toHash.put(opt.toString(), value));

The latter determines the ST-<hash> hash from BuildOptions.getStarlarkOptions, which holds Starlark flags that aren't at their default values. This even applies to --//pkg/demo:myflag=a set at the command line via this call stack:

Screen Shot 2021-04-14 at 5 03 50 PM

Long-story short, all the logic setting ST-<hash> needs some refining. Julie, who led all this work, has shifted her focus off Bazel, so it's not getting enough attention now. She shared a doc outlining interesting possibilities for improving the logic. Someone else may pick up the general challenge over this quarter. This bug, IMO, would naturally fall out of that.

All said, I agree with your basic point. I believe one of Julie's proposals would address this: have ST-<hash> reflect values that are different from their top-level flag defaults. In the first build, that default would be --//pkg/demo:myflag=a. In the second build, it's --//pkg/demo:myflag=b. As long as the value stays consistent within a build, --//pkg/demo:myflag doesn't have to be reflected in the hash.

The implementation of that logic may be complicated within Bazel, since it requires every transition to know what the top-level default values are. One strong Bazel design principle is targets don't have to understand anything about whoever depends on them. So passing this info from the top level down to a target is a bit of a challenge. I believe there are decent solutions to that. But this isn't a 1-hour fix.

@mishas
Copy link
Contributor Author

mishas commented Apr 14, 2021

Thank you @gregestren for the analysis!

Some things that still don't make sense to me:
From what I understood from your comment, it seems like the hash (as part of ST-<hash> is different for different values of the flag. If that's correct than why doesn't it recompile everything when I don't cross-compile? I would expect the hashes to be different for regular compilation as well, wouldn't they?

Thanks again!

@gregestren
Copy link
Contributor

By cross-compile, do you mean do a build with a transition in it?

If a build has no transitions, the ST-<hash> prefix never gets added in the first place. So it doesn't matter what flags would be in that hash. Only a transition can add that prefix.

Also, I'd never expect host tools to re-compile. Can you clarify that observation?

@mishas
Copy link
Contributor Author

mishas commented Apr 16, 2021

@gregestren, Thanks again for the swift reply.

Regarding the cross compilation / transition:

To enable cross-compilation in Go, one needs to set the goos and goarch arguments. Based on the following code, it seems like indeed without setting those values the build has no transitions:

https://github.com/bazelbuild/rules_go/blob/f2b90108b5e87faa933fde9e481b02d48ae132d5/go/private/rules/transition.bzl#L59-L73

(It seems like it only applies transitions if one of the following keys are set: ("goos", "goarch", "pure", "static", "msan", "race", "gotags", "linkmode")).

But then I tried to set "pure" = "auto",, which by the code above should also apply a transition, but the re-compilation didn't happen.
With that said, I also tried "pure" = "on",, and in that case there was recompilation. (maybe during "pure" = "auto", there's no transition?)

I must admin that it was really hard for me to follow rules_go's code, and I might be missing something obvious there. (Sorry for that).

Regarding tools recompile.
I've added a Github Action to the demo repo. That action first compiles without flags, then with --//pkg/demo:myflag=a and then with --//pkg/demo:myflag=b (without cleaning between them).
You can see the output here: https://github.com/mishas/bazel-cross-compile-rebuild/runs/2357789879

You can see that we have the following lines repeating in 2 of the compilations (Build without flags and Build with myflag=b):

[24 / 228] Compiling src/google/protobuf/compiler/command_line_interface.cc; 2s linux-sandbox ... (2 actions, 1 running)
[27 / 228] Compiling src/google/protobuf/compiler/objectivec/objectivec_enum_field.cc; 1s linux-sandbox ... (2 actions, 1 running)
[36 / 228] Compiling src/google/protobuf/compiler/cpp/cpp_map_field.cc; 1s linux-sandbox ... (2 actions, 1 running)
[40 / 228] Compiling src/google/protobuf/compiler/js/js_generator.cc; 6s linux-sandbox ... (2 actions, 1 running)
[49 / 228] Compiling src/google/protobuf/compiler/csharp/csharp_wrapper_field.cc; 1s linux-sandbox ... (2 actions running)
[55 / 228] Compiling src/google/protobuf/compiler/java/java_message_builder_lite.cc; 0s linux-sandbox ... (2 actions running)
[63 / 228] Compiling src/google/protobuf/compiler/java/java_map_field_lite.cc; 1s linux-sandbox ... (2 actions running)
[72 / 228] Compiling src/google/protobuf/compiler/java/java_file.cc; 2s linux-sandbox ... (2 actions running)
[87 / 228] Compiling src/google/protobuf/compiler/java/java_enum_field_lite.cc; 1s linux-sandbox ... (2 actions running)

The compilation above (if I understand correctly) is of the protoc compiler, which is a tool.

I'm kind of confused why it's not recompiling for --//pkg/demo:myflag=a, I'm pretty sure it did last time I tried, but now it does't. So - at least it now seems like it "understands" that the default is myflag=a.

Thanks again for your help!

@gregestren
Copy link
Contributor

I'm not deeply familiar with the go rules logic either. But I think the basic principle stands that there are are reasonable scenarios where re-compilation happens and shouldn't, and we can focus this issue on that.

Re: the host deps, if you run:

$ bazel aquery  --//pkg/demo:myflag=bazel 'deps(//pkg/demo)'  | grep "Compiling src/google/protobuf" -A 3
action 'Compiling src/google/protobuf/stubs/status.cc'
  Mnemonic: CppCompile
  Target: @com_google_protobuf//:protobuf_lite
  Configuration: darwin-opt-exec-2B5CBBC6-ST-3efc85f88b34

this shows that those actions come from @com_google_protobuf//:protobuf_lite and are compiled in the exec, not host configuration. Those are subtly different, but the key two points are:

  1. both inherit the top-level Starlark options:
    return builder.addStarlarkOptions(starlarkOptionsMap).build();
  2. Unlike host, exec keeps the ST-<> hash in its path, so it has the same invalidation as target.

@mishas
Copy link
Contributor Author

mishas commented Apr 17, 2021

@gregestren, Thank you for the explanation. I wasn't aware of the difference between host and exec (every day I learn something new about Bazel :) ), and it makes perfect sense for protobuf to be exec rather than host.

In any case, as you noted above, there's still recompilation when one is not necessary, and I now better understand how the issue of ST-<hash> change with transition is the problem here.

I guess, the only thing left for me, is to wait for this to get prioritized and fixed.

Thanks again!

@gregestren
Copy link
Contributor

Apologies about the host / exec confusion. We need some gentler documentation in general. The info is there if you know where to look for it, but lots of people understandably don't know where to look for it!

@torgil
Copy link
Contributor

torgil commented Jun 18, 2021

I believe one of Julie's proposals would address this: have ST- reflect values that are different from their top-level flag defaults

This sounds promising. This in combination with the ability for a transition to reset a flag to it's top-level default value would also solve the scenario below where D is affected by build-settings from both B and C.
A -> B -> D
A -> C -> D

@gregestren
Copy link
Contributor

Just FYI @sdtwigg and I have been actively discussing this theme recently.

I think we're going to formally up-prioritize this in the new quarter (July+).

@gregestren
Copy link
Contributor

gregestren commented Sep 3, 2021

Update:

I have an idea I'm going to try prototyping to address this specific issue.

Bazel has a map of "changed" Starlark options that determines hash values. I think it adds options that are set at the command line to this map, which produces the effect you see. I also think that's unnecessary - if we exclude command-line settings from that map we'll still get correct builds without this issue. I'll link a PR if it looks like it works or an updated comment if there are complications.

Re: the larger issue of transitions and hashes, see #12171 (comment).

@gregestren
Copy link
Contributor

I think #13580 will address this. I'm prioritizing the review on it now.

@gregestren
Copy link
Contributor

Should be fixed by #13580.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) Starlark configuration Starlark transitions, build_settings team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants