Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make transition directory name fragment reflect output directory name #13587

Closed

Conversation

torgil
Copy link
Contributor

@torgil torgil commented Jun 18, 2021

If output directory name is overriden, the output path does no longer
consists of the transition directory name fragment.

The calculation of transition directory name fragment does not reflect
this situation and instead we can get conflicts due to this name itself.

This patch makes sure that a the transition directory name fragment
gets a unique hash for a given output directory name.

If output directory name is overriden, the output path does no longer
consists of the transition directory name fragment.

The calculation of transition directory name fragment does not reflect
this situation and instead we can get conflicts due to this name itself.

This patch makes sure that a the transition directory name fragment
gets a unique hash for a given output directory name.
@gregestren
Copy link
Contributor

I read #12171 (comment) but can you clarify more precisely what problem you're addressing?

Broadly speaking we'd like to keep the choice of output directory names outside of the public API as much as possible, to maximize the ability to optimize it (and do so methodically and correctly) from within Bazel's core algorithms.

@aiuto
Copy link
Contributor

aiuto commented Jun 24, 2021

@torgil
PRs like this should have a corresponding issue first. In that issue we can discuss proposed solutions and agree upon a path.

@torgil
Copy link
Contributor Author

torgil commented Sep 23, 2021

I read #12171 (comment) but can you clarify more precisely what problem you're addressing?

Broadly speaking we'd like to keep the choice of output directory names outside of the public API as much as possible, to maximize the ability to optimize it (and do so methodically and correctly) from within Bazel's core algorithms.

@gregestren We have used "output directory name" transition from the start, besides current issues and the different set of problems we faced then it gives us flexibility with naming conventions to avoid duplicated actions (like generated platform independent files ending up as platform dependent actions due to its path). It also allows for more predictable output-paths. Since you have removed this feature we'll review the situation again and create a separate issue for this.

@torgil
PRs like this should have a corresponding issue first. In that issue we can discuss proposed solutions and agree upon a path.

@aiuto Agree. Now we might have to take a step back to reintroduce the possibility to set output directory name (see above). The idea is that we don't use the ST-suffix and don't want it to affect the configuration hash.

I tried to revert this patch and got hit by #14023

@gregestren gregestren added the team-Configurability platforms, toolchains, cquery, select(), config transitions label Sep 24, 2021
@gregestren
Copy link
Contributor

I'm sorry, I didn't realize output directory name was being used (or honestly was even usable) outside of setting "host", where these issues don't matter. So af0e20f complicates your life.

The combined work @sdtwigg is doing (to make transition ordering sequences irrelevant) and me (to support platform-independent output paths) I think covers your uses cases. We're trying to tackle those in more deeply principled ways. But that comes at the cost of much more involved work.

What kind of platform-independent files do you have?

Of course I strongly prefer the longer term, more principled solutions. But I recognize they have unknown ETAs and the issues you experience happen now. Making sense of the right balance there is IMO the crucial question.

@eric-skydio
Copy link

@torgil We were also using output directory name in the same way (probably because I found a comment of yours somewhere), and came across this thread in trying to find a workaround for af0e20f that will let us upgrade to prerelease Bazel 5.0.0

The best solution I have found (which seems to work ok, but isn't ideal) is to define an default_config_alias rule that passes through all providers from the aliased rule, and has both and incoming and outgoing edge transition attached to it. For every config option that is configured anywhere in our build by native or Starlark transitions, the incoming edge transition sets it to some garbage value and the outgoing edge transition sets it back to a sane "global" default value. This ensures that one of those transitions will definitely add it to the affected by starlark transition list.

This is ugly for a few reasons:

  1. Because it requires two transitions, we can't attach this to the actual rule we want to protect, which means we need to mark that rule as "manual" so it doesn't build directly.
  2. (like with output directory name) dependencies of the protected rule are "marred" by the transition, and can't share configuration with any other instances of themselves. In this case, it's because the affected by starlark transition list has grown and there's no way to shrink it again. This means some tools (and their dependencies) get built twice.

@gregestren / @sdtwigg I think the ideal solution here is to define affected by starlark transition as only containing config settings whose values do not match their original value as configured on the command line, so that a starlark transition can "reset" the effects of a previous transition, and restore the configuration to a canonical state in the process. I don't know how hard that is to implement.

As a (maybe) simpler fallback, just allowing a transition to directly specify affected by starlark transition and not overruling it immediately would let us "promise" we've undone all previous transitions, although it requires trusting the transition author more. This looks like it should be only a couple lines of change in Bazel.

@torgil
Copy link
Contributor Author

torgil commented Oct 4, 2021

The combined work @sdtwigg is doing (to make transition ordering sequences irrelevant) and me (to support platform-independent output paths) I think covers your uses cases.

@gregestren This sounds analog to "output directory name". Are there any documentation / discussions on this available?
Does this give stable output directories to use in IDE:s for instance (ie no hashes)?
Will these be more fine-grained, eg per action instead of per target?

What kind of platform-independent files do you have?

Mostly generated source files (test mocks, proto files, ...), code metrics, documentation and other metadata.

The best solution I have found (which seems to work ok, but isn't ideal)

@eric-skydio What scenario is that workaround solving?
For now, i guess the easiest solution is just to revert af0e20f from the prerelease commit. If that is not an option you can try to do transition on "//command_line_option:platform_suffix" and set it to "dummy/../<output directory name>" (dummy may not be needed). This at least worked somewhere < 3.0.

@eric-skydio
Copy link

@eric-skydio What scenario is that workaround solving?

This workaround solves the problem where we have generated source code that is depended on from targets in several different configurations, and we want the code generation to run only once, rather than separately several times with different output roots. The workaround I provided accomplishes that by always transitioning to a "standard" configuration, making a diamond in the configuration graph and also saving analysis time. That standard configuration ends up with an output root of nocpu-fastbuild-STxxxxx/ (our "standard" configuration defines "nocpu" architecture), but since it's always exactly the same configuration the hash is always identical, and it doesn't cause rebuilds.

This is preferable to allowing different configurations and just forcing the output root to match, because it avoids analyzing the targets twice and means we don't need to worry about accidentally producing different generated code in some circumstances and breaking build hermeticity.

@eric-skydio
Copy link

eric-skydio commented Oct 4, 2021

image

This is the resulting bazel cquery graph, showing that all users of the :codegen target depend on the same def4e2c configuration of that target, even though some are python2 binaries, others are python3 binaries, and others are build under exec configuration.

@gregestren
Copy link
Contributor

@gregestren / @sdtwigg I think the ideal solution here is to define affected by starlark transition as only containing config settings whose values do not match their original value as configured on the command line, so that a starlark transition can "reset" the effects of a previous transition, and restore the configuration to a canonical state in the process. I don't know how hard that is to implement.

This is exactly what I'd like to do. I don't think the configuration should depend on the sequence of transitions that got you there.

I've been thinking default values vs. command-line set values (i.e. with bazel build --compilation_mode=opt the command-line value, opt, is different from the flag's default value). But I'm open to debating that point.

@gregestren
Copy link
Contributor

The combined work @sdtwigg is doing (to make transition ordering sequences irrelevant) and me (to support platform-independent output paths) I think covers your uses cases.

@gregestren This sounds analog to "output directory name". Are there any documentation / discussions on this available? Does this give stable output directories to use in IDE:s for instance (ie no hashes)? Will these be more fine-grained, eg per action instead of per target?

No. But we can formalize our ideas more. Let me talk with @sdtwigg to get that started.

I keep on seeing themes of you, @eric-skydio and others basically reinventing the output path algorithm with horrible hacks. This makes me further think we should settle on a principled core algorithm and do that.

The change as I see it would not offer per-action config paths. We can explore that, but that may be easy or complicated depending on exactly what you want out of it.

For IDEs, it would at least be more stable in terms of the top-level configuration - and any configuration that comes back to the top-level configuration after transitions - would be hash-free. What else would you want from the IDE perspective?

@eric-skydio
Copy link

I've been thinking default values vs. command-line set values (i.e. with bazel build --compilation_mode=opt the command-line value, opt, is different from the flag's default value). But I'm open to debating that point.

It sounds like we're on the same page, and I'm excited to get away from the hack that was directly specifying the output directory and towards a transition that actually modifies the configuration in the desired ways.

Ideally, I should be able to use a Starlark transition to return to any hashfree directory name produced by a native configuration change, for example a Starlark transition that modifies only the CPU value should result in a hash-free directory, since CPU is already encoded in the output directory name (assuming the configuration exactly matches what would be produced by just specifying the --cpu flag), and similar for compilation mode and Python version which are also directly encoded in the name. The trickier case is when is_exec_configuration is true, in which case the hash should only include elements that differ from the default exec configuration.

The reason I suggested comparing against the command line configuration is to ensure that k8-fastbuild always refers to exactly one configuration in any given Bazel invocation. It does unfortunately mean that the directory hash associated with a given specific configuration can change as a result of command line flags, since some starlark-induced configuration that was previously the default no longer is, which is potentially confusing, but I don't see an easy way around that.

@moroten
Copy link
Contributor

moroten commented Oct 6, 2021

For reference, --platform_suffix=/.. and output directory name were both mentioned at #7477.

One use case for output directory name is when setting up your IDE to index generated files etc. Finding the right ST- directory is user unfriendly, especially when it might easily change. It also makes troubleshooting more tricky as you don't know which directory in bazel-out/ to look in.

Transitioning from the command line parameters to something else and then back again does not take you back to k8-opt/. To avoid building multiple times, ST- has to be stripped.

If we know that certain flags do not affect the commands, then output directory name is the only option we have currently to avoid buying more CPU power. I think it is a really creative solution to reduce energy consumption and save money.

@torgil
Copy link
Contributor Author

torgil commented Oct 6, 2021

That standard configuration ends up with an output root of nocpu-fastbuild-STxxxxx/ (our "standard" configuration defines "nocpu" architecture), but since it's always exactly the same configuration the hash is always identical, and it doesn't cause rebuilds.

OK. Does this mean you transition //command_line_option:compilation_mode to fastbuild also?
We have a separate patch to be able to do transitions on that, I'm unsure if you can do this on master.

I keep on seeing themes of you, @eric-skydio and others basically reinventing the output path algorithm with horrible hacks. This makes me further think we should settle on a principled core algorithm and do that.

Sounds good. Be aware that these "horrible hacks" can include domain/implementation specific components. If we have for instance a build setting affecting only D below, we might want to include that in the output path in D only to avoid duplicated actions in A, B, C, E.

A -> B -> E
A -> C -> D -> E

The change as I see it would not offer per-action config paths. We can explore that, but that may be easy or complicated depending on exactly what you want out of it.

The issue I'm thinking of is rules that generate several actions with different configuration dependencies (like generate + compile). There is a cost associated with splitting these rules to several rules and cluttering the target graph.

What else would you want from the IDE perspective?

Without having the whole picture, I guess one common use case is to be able to configure "find coverage files in this directory" or "find generated sources in this directory" in a way that hopefully survives a few rebases. A separate build setting for this use-case may be acceptable.

@eric-skydio
Copy link

OK. Does this mean you transition //command_line_option:compilation_mode to fastbuild also?
We have a separate patch to be able to do transitions on that, I'm unsure if you can do this on master.

@torgil Yes, we are successfully transitioning that on master, as well as is_exec_configuration, platform_suffix, python_version, and a few other tricky ones that are automatically set by cfg="exec" in rule definitions. The most difficult one for us is copt/cxxopt, since we need to "guess" what the canonical state of those was, and we'd rather not duplicate them in .bazelrc and the transition definition, but it hasn't been much of an issue in practice since we control the whole build.

Transitioning from the command line parameters to something else and then back again does not take you back to k8-opt/. To avoid building multiple times, ST- has to be stripped.

@moroten It is currently the case that a starlark transition cannot return to k8-opt/, but the proposed rule that ST hash includes only non-default config values would change that. In particular, if there are zero non-default config settings (because compilation_mode = "opt" and all other settings match the defaults for that compilation mode), there would be no elements in the starlark transition hash, so it would be dropped from the output directory name entirely.

... If we have for instance a build setting affecting only D below, we might want to include that in the output path in D only to avoid duplicated actions in A, B, C, E.

A -> B -> E
A -> C -> D -> E

The invariant you must maintain to do this safely is actually quite a bit more strict than that. In particular, because D is affected by the build setting, its outputs must go into separate directories for each configuration, which almost certainly means C will need to run twice no matter what (once with each set of inputs from D). The only (safe) efficiency wins come if you can promise that the output of C will not depend on which input files it was provided with from D, but in that case you probably should either not depend on D at all, or (once the changes proposed here land) use a transition to reset any configurations differences that don't matter, so C can always be built in a single configuration with a single output root.

The edge cases where this doesn't work relate to cases where there are important analysis-time behavior differences between rules (probably runfiles-related) which don't cause any execution-time behavior differences in one or more of the actions they create, which is important to solve and does come up for us occasionally. Having a way to unsafely force the output root for those cases (particularly if we could do it per-file or per-action) would still be nice, but is a separate issue.

In practice, we've found the invariant above to be sufficiently subtle that developers get it wrong more often than they get it right, which is why I'm excited to switch to a safe option that actually normalizes the configuration.

@gregestren gregestren added the P1 I'll work on this now. (Assignee required) label Oct 18, 2021
@torgil
Copy link
Contributor Author

torgil commented Oct 20, 2021

It is currently the case that a starlark transition cannot return to k8-opt/, but the proposed rule that ST hash includes only non-default config values would change that.

Will this apply for exec hashes as well?

Given that build settings set on the command-line doesn't add a ST-hash this would require the ability for a transition to set the top-level value rather than the default value (as suggested in another thread).

... If we have for instance a build setting affecting only D below, we might want to include that in the output path in D only to avoid duplicated actions in A, B, C, E.
A -> B -> E
A -> C -> D -> E

The invariant you must maintain to do this safely is actually quite a bit more strict than that. In particular, because D is affected by the build setting, its outputs must go into separate directories for each configuration, which almost certainly means C will need to run twice no matter what

True, with the exception that it's sufficient with separate output paths for all produced outputs including filenames. For some use-cases, for instance "debug compile subsystem D", the build setting value will (for that subsystem) not add additional output directory naming complexity compared to "-c dbg".

C includes D and has to be duplicated if the build setting is changed above (not controlled by) C

Edit: Today both A and C needs to take care of these build settings and reset them on other dependencies to avoid conflicts or duplicated actions (in B, E). The root cause for this seems to be that the action conflicts due to conflicting configurations. A solution should be to not require configuration hash to be equal which also makes this PR redundant.

@sdtwigg
Copy link
Contributor

sdtwigg commented Nov 2, 2021

I would like de-duplicate discussion here to #14023 (comment) (I'll leave you to close this if you agree.)

Note that outputDirectoryNameFragment was removed (primarily because it was never meant to be part of the user API and users do not have sufficient Starlark access to tools and data necessary to generate an output directory name that would be appropriately unique across the build).

@gregestren
Copy link
Contributor

@eric-skydio I'm following @sdtwigg 's advice to consolidate at #14023 (comment). I'm especially curious for your and @fmeum 's buy-in since you've both been closely involved with the discussion.

If anything at the other bug misses a concern please comment!

@torgil
Copy link
Contributor Author

torgil commented Nov 5, 2021

I would like de-duplicate discussion here to #14023 (comment) (I'll leave you to close this if you agree.)

Note that outputDirectoryNameFragment was removed (primarily because it was never meant to be part of the user API and users do not have sufficient Starlark access to tools and data necessary to generate an output directory name that would be appropriately unique across the build).

@sdtwigg Good job. There are two other issues remaining in the discussion here:

  1. output directory name transition: Solution in Avoid different transition output directories when build settings have the same values #14023 will make this PR obsolete but it doesn't solve the naming issue discussed in this thread.

  2. This PR should also be obsolete if different configurations didn't resulted in action conflicts (regarding my above edit): This would also solve another issue leading to restrictions in the dependency graph (see Avoid action conflicts due to different configuration hashes #14236). To solve these issues we have traded build-setting transitions for rule complexity by creating multiple actions (for relevant values) instead. The reduced number of transitions needed has performance benefits as well as reducing complexity in higher level rules that needs all versions. There are however cases (build-settings) where the benefit is too small to motivate more added complexity (dealing with cross products of different properties).

I wrote #14236 to clarify and possibly solve these issues (would be a nice fit for 2). Together with #14023 (both making this PR obsolete), I agree to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes P1 I'll work on this now. (Assignee required) team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants