Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transition scalability by considering visibility of user-defined build setting #12568

Closed
ulrfa opened this issue Nov 26, 2020 · 8 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request

Comments

@ulrfa
Copy link
Contributor

ulrfa commented Nov 26, 2020

[I'm creating a separate ticket too keep #6526 focused]

An optimal solution for efficient action caching and small graph size, in combination with transitions, seems far away. At least as a generic solution for all use cases.

But could transition scalability be improved for some use cases, by filtering out user-defined build settings having a visibility that assure they can not affect parts of the build graph?

Preferably by automatic trimming the build graph, by removing build options derived from user-defined build settings that are not visible to the current package. And also not visible in transitive dependency!

Or alternatively avoid hashing such non-visible build options when calculating output directories.

Would it help that rule visibility is nonconfigurable and does not require resolving select()? Or is the problem still of the same complexity, as detecting which options actually have an effect, due to the build options used via transitive dependencies?

@gregestren, do you think a feature like this would make sense? (if not, feel free to close this ticket)

@aiuto aiuto added bad error messaging Issues where users get stuck because they don't understand what they did wrong team-Configurability platforms, toolchains, cquery, select(), config transitions untriaged and removed bad error messaging Issues where users get stuck because they don't understand what they did wrong labels Nov 30, 2020
@gregestren
Copy link
Contributor

That's an interesting idea. But yes, transitive dependencies are always going to complicate things unless we have specific information to tell us they don't matter.

Generally speaking, if //a:a depends on //b:b depends on //c:c and a build setting --foo is only used by //c:c, it's still the case that --foo can affect //a:a. This is because //a:a's output partially depends on whatever //c:c does (more specifically: whatever information //c:c sends up its providers, which can propagate up transitively).

If we restrict --foo's visibility to package //c, that tells us that //a:a can't directly read //foo. But it doesn't tell us anything about //c:c's indirect effects on //a:a.

A concrete analogy is simply building a C++ binary with source code stored in libraries. If //:my_binary depends on //:lib1 which depends on //:lib2 which has lib2.cc, then changing code in lib2.cc changes the final binary produced by //:my_binary. Even though the target for //:my_binary doesn't know anything directly about lib2.cc, and may not even have visibility access to it.

One way around this could be to tag targets. If you the project owner happen to know that some build flag or whatever really doesn't change the output, you could declare that as an attribute on the target, which is information Bazel could use to make more intelligent decisions. That could work just fine, but it's a maintenance burden and could be error prone if targets are tagged wrong or transitive dependencies change in away that makes their assumptions no longer true.

I can't remember if I mentioned in the other bug that I wrote a tool that could conceivably automate this. Basically do the tagging above but the tool would automatically apply the tags correctly. Kind of like an automatic code formatter. That saves a lot of the maintenance burden but would still have to be run regularly to ensure it stays up to date.

If you see anything I'm not in your idea, please share. I'm very open to novel creative approaches.

@gregestren gregestren added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Dec 4, 2020
@ulrfa
Copy link
Contributor Author

ulrfa commented Dec 7, 2020

Thanks @gregestren!

Could the output directory calculation start from the leafs in the build graph? (that don’t have any dependencies)

And let the output directory hash of each action, be calculated based on the output directory hashes of its direct dependencies?

Like hash ( all_options_except_from_non_visible_user_defined_build_settings + output_directory_hashes_of_all_direct_dependencies )

I’m afraid tagging the targets would be too error prone and cumbersome in our code base, even if automated with a tool.

@gregestren
Copy link
Contributor

gregestren commented Dec 7, 2020

Like hash ( all_options_except_from_non_visible_user_defined_build_settings + output_directory_hashes_of_all_direct_dependencies )

That's an interesting idea. Maybe?

It reads to me like basically doing "manual tagging" in a more automated way. In other words,

build_setting(
    name = "my_setting",
    visibility = ["//foo:__pkg__"]
)

is similar to manually tagging every target not in //foo with does_not_use = ["my_setting"]. This wouldn't work for graph size (i.e. stripping my_setting from the configuration) but could possibly work for output path stripping.

Caveats:

  • We'd have to make sure any output path a rule computes is in fact that rule's outputs. I think it's possible for rules to request output paths without actually creating an action out of them.
  • If a target applies a transition that reads or writes "my_setting", does that mean the target depends on "my_setting"? I don't think visibility would restrict this.

You said your use case involves a bunch of cc_library targets that would read the flag. Given that, how much of the subgraph can you imagine practically visibility-restricting?

@ulrfa
Copy link
Contributor Author

ulrfa commented Jan 11, 2021

Now I'm back from vacation.

Caveats:

  • We'd have to make sure any output path a rule computes is in fact that rule's outputs. I think it's possible for rules to request output paths without actually creating an action out of them.

I don’t have enough insight to comment on that.

  • If a target applies a transition that reads or writes "my_setting", does that mean the target depends on "my_setting"? I don't think visibility would restrict this.

It could be problematic for graph size trimming, but I believe it would be OK if we aim for only output path stripping. Transitions writing private options might be compared with setting private flags on the command line, which should be OK.

But in this context, I'm worried about #12669. What do you think about that?

You said your use case involves a bunch of cc_library targets that would read the flag. Given that, how much of the subgraph can you imagine practically visibility-restricting?

The majority of our settings are visible only in the package it is defined in, and used by test cases testing code in its own package. Those should not have to affect all the cc_libraries deeper in the dependency graph.

We also have public settings, that might be set by transitions from the top and affect cc_libraries deep down in the dependency tree, but I expect those to be fewer and hopefully resulting in a limited set of combinations.

@gregestren
Copy link
Contributor

But in this context, I'm worried about #12669. What do you think about that?

Looks like we're making some progress on that. :) Can you remind me specifically where config_setting / select() fits into the larger request?

You said your use case involves a bunch of cc_library targets that would read the flag. Given that, how much of the subgraph can you imagine practically visibility-restricting?

The majority of our settings are visible only in the package it is defined in, and used by test cases testing code in its own package. Those should not have to affect all the cc_libraries deeper in the dependency graph.

Okay, this is really helpful input. This is precisely where I think there's the best practical possibility for making improvements. I think this is orthogonal to the visibility-restricted approach.

The limitation of the visibility-restricted approach is that you have to know the entire graph's transitive dependencies before knowing the full visibility scope. This requires instantiating the entire graph with original configurations (including the flag you want to remove), which already eliminates graph size trimming. As we've already discussed I believe we could conceptually get output path trimming with this approach.

But if we have some confidence that only, say, the top target in the build reads the flag, we could auto-remove the flag for all of it dependents. But that knowledge would have to come from a different place than visibility.

So... we do have options. Either of these approaches could yield benefits. If you want to continue exploring this I'd want to really lock down the use cases we'd want to optimize and understand the expected benefit. I wrote a tool ctexplain a while ago that could help. Any approach we take is some effort so we need to be as clear as possible how generally valuable the benefit would be on practical builds.

@ulrfa
Copy link
Contributor Author

ulrfa commented Jan 28, 2021

But in this context, I'm worried about #12669. What do you think about that?

Looks like we're making some progress on that. :) Can you remind me specifically where config_setting / select() fits into the larger request?

Yes good progress on #12669! Thanks @gregestren!

My idea is that user defined build settings could be ignored when calculating output path for an action, if that action (and its transitive dependencies) can not see the user defined build setting, due to the visibility. The config_setting could be seen as part of a transitive dependency chain (between the select() and the user defined build setting). But the dependency chain would not be reliable if the visibility of the config_setting could be circumvented. Right?

If you want to continue exploring this I'd want to really lock down the use cases we'd want to optimize and understand the expected benefit. I wrote a tool ctexplain a while ago that could help. Any approach we take is some effort so we need to be as clear as possible how generally valuable the benefit would be on practical builds.

I hope to find time to investigate that next week. I got feedback from my colleagues that we might have build options that I was not aware, that affects cc_libraries deeper down in the dependency graph.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 28, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) stale Issues or PRs that are stale (no activity for 30 days) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants