Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure caused by no-op starlark transition (friction between Starlark None and java null) #12888

Closed
stefanbucur opened this issue Jan 23, 2021 · 9 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug

Comments

@stefanbucur
Copy link

Description of the problem / feature request:

Our fuzzing rules library implements a Starlark transition to provide compiler instrumentation options when building fuzz test executables. In order to reduce build size output, we would like to provide a "no-op" mode in which the transition is disabled through a bool config flag and the parent configuration is passed through unchanged.

However, when we attempt a build in this "no-op" transition configuration, we're hitting build failures of the form:

ERROR: file 'external/re2/_objs/re2/rune.pic.o' is generated by these conflicting actions:
Label: @re2//:re2
RuleClass: cc_library rule
Configuration: 3c05312acc4ad4afcdf2f011f312660ad214f21abaa06671cad80d482163789b, 2e241eb8731382b18fcb12c9a4b7eb174abce65a2e229b9754f25836277ec281
Mnemonic: CppCompile
Action key: e064b11b5c25436d8c57c05d4e04baa44c8de4fdc572c688519590053b2a3df1
Progress message: Compiling util/rune.cc
PrimaryInput: File:[/usr/local/google/home/sbucur/.cache/bazel/_bazel_sbucur/2527f57e21cfbf2e414b894f7610e705/external/re2[source]]util/rune.cc
PrimaryOutput: File:[[<execution_root>]bazel-out/k8-fastbuild/bin]external/re2/_objs/re2/rune.pic.o
Owner information: ConfiguredTargetKey{label=@re2//:re2, config=BuildConfigurationValue.Key[3c05312acc4ad4afcdf2f011f312660ad214f21abaa06671cad80d482163789b]}, ConfiguredTargetKey{label=@re2//:re2, config=BuildConfigurationValue.Key[2e241eb8731382b18fcb12c9a4b7eb174abce65a2e229b9754f25836277ec281]}
MandatoryInputs: are equal
Outputs: are equal
ERROR: com.google.devtools.build.lib.skyframe.ArtifactConflictFinder$ConflictException: com.google.devtools.build.lib.actions.MutableActionGraph$ActionConflictException: for external/re2/_objs/re2/rune.pic.o, previous action: action 'Compiling util/rune.cc', attempted action: action 'Compiling util/rune.cc'

Running bazel config on the two configuration yields the following:

$ bazel config 3c05312acc4ad4afcdf2f011f312660ad214f21abaa06671cad80d482163789b 2e241eb8731382b18fcb12c9a4b7eb174abce65a2e229b9754f25836277ec281
INFO: Displaying diff between configs 3c05312acc4ad4afcdf2f011f312660ad214f21abaa06671cad80d482163789b and 2e241eb8731382b18fcb12c9a4b7eb174abce65a2e229b9754f25836277ec281
Displaying diff between configs 3c05312acc4ad4afcdf2f011f312660ad214f21abaa06671cad80d482163789b and 2e241eb8731382b18fcb12c9a4b7eb174abce65a2e229b9754f25836277ec281
FragmentOptions user-defined {
  @rules_fuzzing//fuzzing:cc_engine_instrumentation: none, null
  @rules_fuzzing//fuzzing:cc_engine_sanitizer: none, null
  @rules_fuzzing//fuzzing:cc_fuzzing_build_mode: false, null
}

The only differences in config seem to be related to input configs in the transition that we only read and not list in the output.

The transition is defined as follows:

fuzzing_binary_transition = transition(
    implementation = _fuzzing_binary_transition_impl,
    inputs = [
        "@rules_fuzzing//fuzzing:cc_engine_instrumentation",
        "@rules_fuzzing//fuzzing:cc_engine_sanitizer",
        "@rules_fuzzing//fuzzing:cc_fuzzing_build_mode",
        "//command_line_option:copt",
        "//command_line_option:conlyopt",
        "//command_line_option:cxxopt",
        "//command_line_option:linkopt",
    ],
    outputs = [
        "//command_line_option:copt",
        "//command_line_option:conlyopt",
        "//command_line_option:cxxopt",
        "//command_line_option:linkopt",
    ],
)

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

$ git clone git@github.com:bazelbuild/rules_fuzzing.git -b no-op-transitions
$ cd rules_fuzzing
$ bazel test //examples/...

What operating system are you running Bazel on?

Debian-based Linux.

What's the output of bazel info release?

release 4.0.0

Have you found anything relevant by searching the web?

This issue seems to be loosely related to conversations in #11196.

@gregestren gregestren added team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug untriaged labels Jan 26, 2021
@juliexxia juliexxia changed the title Build failure caused by no-op transition Build failure caused by no-op starlark transition (friction between Starlark None and java null) Feb 4, 2021
@juliexxia
Copy link
Contributor

Hi Stefan,

Your issue isn't actually one I've seen before. But I know there's traditionally some friction between the java null value and the Starlark None value. I augmented your transition to try to manually bypass that as such:

def _fuzzing_binary_transition_impl(settings, attr):
    is_fuzzing_build_mode = settings["@rules_fuzzing//fuzzing:cc_fuzzing_build_mode"]
    if not is_fuzzing_build_mode:
        return {
            "@rules_fuzzing//fuzzing:cc_engine_instrumentation": settings["@rules_fuzzing//fuzzing:cc_engine_instrumentation"], 
            "@rules_fuzzing//fuzzing:cc_engine_sanitizer": settings["@rules_fuzzing//fuzzing:cc_engine_sanitizer"],
            "@rules_fuzzing//fuzzing:cc_fuzzing_build_mode": settings["@rules_fuzzing//fuzzing:cc_fuzzing_build_mode"],
            "//command_line_option:copt": settings["//command_line_option:copt"],
            "//command_line_option:linkopt": settings["//command_line_option:linkopt"],
            "//command_line_option:conlyopt": settings["//command_line_option:conlyopt"],
            "//command_line_option:cxxopt": settings["//command_line_option:cxxopt"],
        }

    opts = instrum_opts.make(
        copts = settings["//command_line_option:copt"],
        conlyopts = settings["//command_line_option:conlyopt"],
        cxxopts = settings["//command_line_option:cxxopt"],
        linkopts = settings["//command_line_option:linkopt"],
    )
    opts = instrum_opts.merge(opts, instrum_defaults.fuzzing_build)

    instrum_config = settings["@rules_fuzzing//fuzzing:cc_engine_instrumentation"]
    if instrum_config in instrum_configs:
        opts = instrum_opts.merge(opts, instrum_configs[instrum_config])
    else:
        fail("unsupported engine instrumentation '%s'" % instrum_config)

    sanitizer_config = settings["@rules_fuzzing//fuzzing:cc_engine_sanitizer"]
    if sanitizer_config in sanitizer_configs:
        opts = instrum_opts.merge(opts, sanitizer_configs[sanitizer_config])
    else:
        fail("unsupported sanitizer '%s'" % sanitizer_config)

    return {
        "@rules_fuzzing//fuzzing:cc_engine_instrumentation": settings["@rules_fuzzing//fuzzing:cc_engine_instrumentation"], 
        "@rules_fuzzing//fuzzing:cc_engine_sanitizer": settings["@rules_fuzzing//fuzzing:cc_engine_sanitizer"],
        "@rules_fuzzing//fuzzing:cc_fuzzing_build_mode": settings["@rules_fuzzing//fuzzing:cc_fuzzing_build_mode"], 
        "//command_line_option:copt": opts.copts,
        "//command_line_option:linkopt": opts.linkopts,
        "//command_line_option:conlyopt": opts.conlyopts,
        "//command_line_option:cxxopt": opts.cxxopts,
    }

fuzzing_binary_transition = transition(
    implementation = _fuzzing_binary_transition_impl,
    inputs = [
        "@rules_fuzzing//fuzzing:cc_engine_instrumentation",
        "@rules_fuzzing//fuzzing:cc_engine_sanitizer",
        "@rules_fuzzing//fuzzing:cc_fuzzing_build_mode",
        "//command_line_option:copt",
        "//command_line_option:conlyopt",
        "//command_line_option:cxxopt",
        "//command_line_option:linkopt",
    ],
    outputs = [
        "@rules_fuzzing//fuzzing:cc_engine_instrumentation", # Added your build settings to the outputs
        "@rules_fuzzing//fuzzing:cc_engine_sanitizer",
        "@rules_fuzzing//fuzzing:cc_fuzzing_build_mode",
        "//command_line_option:copt",
        "//command_line_option:conlyopt",
        "//command_line_option:cxxopt",
        "//command_line_option:linkopt",
    ],
)

And now with bazel test //examples/... am getting a different error that seems to have to do with c++ compilation (apologies, very much not my realm of expertise so not sure how to debug further) instead of the transition. I did do some print statement debugging and it looks like, with the command line above, the values of all the settings["//command_line_option:*opt"] inputs are empty lists.

juliexxia-macbookpro11% bazel test //examples/...
INFO: Analyzed 59 targets (2 packages loaded, 77 targets configured).
INFO: Found 52 targets and 7 test targets...
ERROR: /Users/juliexxia/bazel-projects/rules_fuzzing/fuzzing/replay/BUILD:48:11: C++ compilation of rule '//fuzzing/replay:status_util' failed (Exit 1): wrapped_clang failed: error executing command external/local_config_cc/wrapped_clang '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' -iquote . -iquote ... (remaining 23 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox wrapped_clang failed: error executing command external/local_config_cc/wrapped_clang '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' -iquote . -iquote ... (remaining 23 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
fuzzing/replay/status_util.cc:40:10: error: no viable conversion from returned value of type 'int' to function return type 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >')
  return strerror_r(errno_value, error_str_buf, sizeof(error_str_buf));
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:799:5: note: candidate constructor not viable: no known conversion from 'int' to 'const std::__1::basic_string<char> &' for 1st argument
    basic_string(const basic_string& __str);
    ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:804:5: note: candidate constructor not viable: no known conversion from 'int' to 'std::__1::basic_string<char> &&' for 1st argument
    basic_string(basic_string&& __str)
    ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:817:5: note: candidate constructor template not viable: no known conversion from 'int' to 'const char *' for 1st argument
    basic_string(const _CharT* __s) : __r_(__default_init_tag(), __default_init_tag()) {
    ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:867:5: note: candidate constructor not viable: no known conversion from 'int' to 'initializer_list<char>' for 1st argument
    basic_string(initializer_list<_CharT> __il);
    ^
1 error generated.
INFO: Elapsed time: 11.470s, Critical Path: 3.26s
INFO: 53 processes: 9 internal, 44 darwin-sandbox.
FAILED: Build did NOT complete successfully
//examples:empty_fuzz_test_with_corpus                                NO STATUS
//examples:empty_fuzz_test_with_dict                                  NO STATUS
//examples:input_buffer_overflow_fuzz_test                            NO STATUS
//examples:msan_fuzz_test                                             NO STATUS
//examples:new_buffer_overflow_fuzz_test                              NO STATUS
//examples:re2_fuzz_test                                              NO STATUS
//examples:empty_fuzz_test                                      FAILED TO BUILD

FAILED: Build did NOT complete successfully

I am wrapping up my last week with the Bazel team so I'm going to leave you with this potential workaround. If you're still finding it not to work, please re-ping this thread and someone else on the config team will re-triage. Overall, I'd like to keep the issue open since there continue to be issues with the None/null interoperability.

@gregestren gregestren added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Feb 19, 2021
@gregestren
Copy link
Contributor

Acknowledging @juliexxia 's last comment. There's a cluster of related issue that we need to ultimately address holistically. Hopefully the workaround is sufficient in the meantime.

@fmeum
Copy link
Collaborator

fmeum commented Sep 4, 2021

I think that I identified the interaction that causes this action conflict:
In ConfigurationResolver.applyStarlarkTransition, the BuildOptions that are later passed to the transition are obtained from StarlarkTransition.addDefaultStarlarkOptions, which adds the default values of all transitions inputs if necessary.
Later, when the transition turned out to be an effective no-op in FunctionTransitionUtil.applyTransition, these BuildOptions are returned unmodified, in particular without a transitionDirectoryNameFragment.
This causes an action conflict with the untransitioned build, which does not have any of the Starlark input settings set explicitly.

Keeping track of the added defaults and removing them after the transition would be one way to fix this short-term. Depending on how Starlark build settings set on the CLI are treated, it could also be possible to always trim Starlark settings that take on their default values post-transition.

@gregestren
Copy link
Contributor

@fmeum are you saying that caused this specific bug? Even accounting for the none / null differences?

My hope is that the setting of transitionDirectoryNameFragment and affectedByStarlarkTransition can be taken out of the transition logic and move directly into BuildConfiguration instantiation logic. That could offer more consistency: paths represent a pure diff from the top-level settings, without it mattering any more what particular sequence of transitions got you there.

I wrote some more about that to @sdtwigg - I could share in more detail if you like.

Also check out #13580, which is a great step forward. It doesn't solve all problems but it's nudging us in the right direction.

fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Sep 10, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that settings has to be
added to the input BuildOptions temporarily so that it can be read.

Fixes bazelbuild#12888.
fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Sep 10, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that settings has to be
added to the input BuildOptions temporarily so that it can be read.

Fixes bazelbuild#12888.
fmeum added a commit to CodeIntelligenceTesting/bazel that referenced this issue Sep 10, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that setting has to be
added to the input BuildOptions temporarily so that it can be read by
the transition.

Before this commit, these build settings remained in the transition and
could cause action conflicts. With this commit, all build settings that
were added with their default values are iterated through and removed if
they are still set to their defaults post-transition.

Fixes bazelbuild#12888.
@fmeum
Copy link
Collaborator

fmeum commented Sep 10, 2021

I actually think that the None / null friction is a red herring here: Note that the build settings in the original bug report are string flags with default value "none" - not the Starlark value None (note the missing double quotes). I also don't think that this problem has much to do with the transitionDirectoryNameFragment logic - but it can easily get hidden when a non-no-op transition receives a non-trivial fragment and thus evades an action conflict.

Rather, I think that ConfigurationResolver.applyStarlarkTransition violates what seems to be an invariant of BuildOptions: they never hold default values of build settings. By adding the default values to the fromOptions via StarlarkTransition.addDefaultStarlarkOptions, this invariant no longer holds.

I have prepared a PR, #13971, that trims all Starlark build settings that are set to their defaults from the output of a transition - it is the natural counterpart to StarlarkTransition.addDefaultStarlarkOptions. There may be a more direct fix for this issue, but in this way it is clear that ConfigurationResolver.applyStarlarkTransition preserves the invariant regardless of what the particular transition implementation does.

@gregestren
Copy link
Contributor

FYI I know I owe you more responses on your recent input...

@gregestren
Copy link
Contributor

Got it. #12888 (comment) makes a lot of sense. As you stated it it sounds like a no-brainer. I'll continue discussion on #13971.

Re: no-op transitions specifically, would 11f2540 help? The intent was to make no-op transitions more user-friendly and performant: basically provide a simple UI early-return bypass.

I don't know offhand if that code kicks in before the default setting changes you mention. I'd like to think so but I'm not sure. Nothing a quick review through the code couldn't clarify. :)

@gregestren
Copy link
Contributor

@stefanbucur I'm especially interested in your feedback of my suggested workaround. If it works I'd close this issue and consider the issues @fmeum mentions as distinctly focused issues / PRs.

fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that setting has to be
added to the input BuildOptions temporarily so that it can be read by
the transition.

Before this commit, if a build setting was an input but not an output of
a transition, its default value remained in the BuildOptions and could
cause action conflicts. With this commit, all build settings rather than
only output build settings are trimmed post-transition if they are set
to their default value.

Fixes bazelbuild#12888.
fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that setting has to be
added to the input BuildOptions temporarily so that it can be read by
the transition.

Before this commit, if a build setting was an input but not an output of
a transition, its default value remained in the BuildOptions and could
cause action conflicts. With this commit, all build settings rather than
only output build settings are trimmed post-transition if they are set
to their default value.

Fixes bazelbuild#12888.
@fmeum
Copy link
Collaborator

fmeum commented Sep 25, 2021

Got it. #12888 (comment) makes a lot of sense. As you stated it it sounds like a no-brainer. I'll continue discussion on #13971.

Re: no-op transitions specifically, would 11f2540 help? The intent was to make no-op transitions more user-friendly and performant: basically provide a simple UI early-return bypass.

I don't know offhand if that code kicks in before the default setting changes you mention. I'd like to think so but I'm not sure. Nothing a quick review through the code couldn't clarify. :)

Unfortunately that shortcut doesn't help: It applies at a point where defaults have already been added to fromOptions since this is already done in ConfigurationResolver. I verified that I also get action conflicts. The problem seems to have little to do with what the transition does, only with its declared input and output build settings.

fmeum added a commit to fmeum/bazel that referenced this issue Sep 25, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that setting has to be
added to the input BuildOptions temporarily so that it can be read by
the transition.

Before this commit, if a build setting was an input but not an output of
a transition, its default value remained in the BuildOptions and could
cause action conflicts. With this commit, all build settings rather than
only output build settings are trimmed post-transition if they are set
to their default value.

Fixes bazelbuild#12888.
fmeum added a commit to fmeum/bazel that referenced this issue Sep 28, 2021
By convention, BuildOptions should not contain Starlark build settings
that are set to their default values. However, if a Starlark transition
has a build setting as an input, the value of that setting has to be
added to the input BuildOptions temporarily so that it can be read by
the transition.

Before this commit, if a build setting was an input but not an output of
a transition, its default value remained in the BuildOptions and could
cause action conflicts. With this commit, all build settings rather than
only output build settings are trimmed post-transition if they are set
to their default value.

Fixes bazelbuild#12888.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants