Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule finalizers #23160

Closed
tetromino opened this issue Jul 30, 2024 · 10 comments
Closed

Rule finalizers #23160

tetromino opened this issue Jul 30, 2024 · 10 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Loading-API BUILD file and macro processing: labels, package(), visibility, glob type: feature request

Comments

@tetromino
Copy link
Contributor

We wish to add rule finalizers - special symbolic macros (see #19922) which - regardless of lexical position in a BUILD file - are evaluated in the final stage of loading a package, after all non-finalizer targets have been defined. Unlike ordinary symbolic macros, a finalizer can call native.existing_rules(), where it behaves slightly differently than in legacy macros: it only returns the set of non-finalizer rule targets. The finalizer may assert on the state of that set or define new targets.

Evaluating a rule finalizer necessarily requires loading the entire package (except for any other finalizers), preventing lazy symbolic macro evaluation. To maximize the benefit of lazy evaluation, we will need a flag to control whether to force finalizers to be evaluated.

Motivation

Many teams at Google have project- or framework-specific "epilogue macros", which must be invoked at the end of a BUILD file, and which use native.existing_rules() to do some of the following:

  • assert that certain rule targets exist or have expected attribute values
  • declare test targets per each target matching a particular filter (selecting based on name, kind, and attribute values, especially tags)
  • declare a main test target which depends on all targets matching a particular filter
  • declare test suites for tests matching particular tags or wrapping an explicit list of all manual tests matching a particular filter

I suspect that the same holds in other sufficiently large Bazel (mono)repos. Even rule set repos sometimes declare epilogue macros - for example, flags.FLAGS() in rules_android.

Use of epilogue macros has the following drawbacks:

  • Unlike any normal rule declaration or macro call in a BUILD file, the behavior of an epilogue macro depends on where in a BUILD file they are invoked: they must come at the end. This is confusing, requires special handling in tooling, must be enforced by code review or special integration checks, and raises a question: if a BUILD file needs several epilogues, which order should they be in? (There are examples of BUILD files calling 2 or even 3 epilogue macros. By luck, inside Google's monorepo, there don't seem to be any cases of two epilogues in a single package depending on each other - but nothing prevents it.)
  • Because they call native.existing_rules() with its current semantics, they make it fundamentally impossible to lazy-evaluate the package, even if a user is not interested in running any of the epilogue's tests.

Concrete proposal

Consider a theoretical finalizer:

# .bzl file

def _my_finalizer_impl(*, name, attr1, **kwargs):
    manual_tests = []
    for rule in native.existing_rules().values():
        if rule["kind"] == "foo_library" and not rule.get("testonly", False):
            # All new targets must be namespaced under the finalizer's name prefix
            test_name = "%s_%s_manual_conformance_test" % (name, rule["name"])
            manual_tests.append(test_name)
            _conformance_test(
                name = test_name,
                deps = [":" + rule["name"]],
                tags = ["manual"],
                attr1 = attr1,
                ...
            )

    # Work around inability to select manual tests into a test suite
    test_suite(
        name = name + "_manual_conformance_tests",
        tests = manual_tests,
    )

my_finalizer = macro(
    rule_finalizer = True,
    doc = "...",
    implementation = _my_finalizer_impl,
    attrs = {
        "attr1": ...,
    },
)
# BUILD file

load("//x/y:my_finalizer.bzl", "my_finalizer")

# Can be in any position in the BUILD file
my_finalizer(
    name = "my_finalizer",
    attr1 = ...,
}

The native.existing_rules() call in the context of a finalizer will always return a deeply immutable map which contains all non-finalizer-defined rule targets, and does not contain any targets declared in any finalizer (including the currently running one!), ensuring that (a) we can cache the existing_rules map and reuse it for multiple finalizers and (b) finalizer order does not matter.

Implementation in Bazel

A finalizer is a special kind of macro, which can be implemented as a subclass of MacroFunction or as a MacroFunction with a special flag. In particular, lazy/optional evaluation of finalizers (when allowed by a Bazel flag) is taken care of by lazy evaluation of macros.

Package.Builder already has some support for multi-stage building (see buildPartial()); this proposal would add a finalizer stage, the first step of which (if any finalizers are found in the package) would be to save a snapshot of the existing_rules snapshottable bimap.

Interaction with symbolic macros

Finalizers may not be called in a symbolic macro; but finalizers may call symbolic macros and other finalizers (and, of course, legacy macros).

The existing_rules map accessible to a finalizer contains targets regardless of their visibility or which macro they were declared in (note that symbolic macros may declare a target which is not visible to the package where it lives) - a finalizer might assert on a rule target's attributes rather than use it as a dependency, and hiding targets from the finalizer when they are visible to a genquery would be an odd choice.

Lazy evaluation

To evaluate a finalizer, Bazel must load the complete set of the package's non-finalizer targets - we don't want the finalizer's behavior to vary unpredictably depending on which part of the package was evaluated. In this, a finalizer is no different from an epilogue macro. But the advantage of the finalizer is that it's self-contained, intentional, isolated from other finalizers, and there is a way to tell that we don't need to evaluate it:

*if we don't need need any targets declared under the finalizer's namespace
*if a user doesn't care about running finalizer checks (as expressed by a Bazel flag)

In situations where a repo's policies need to be enforced - for example, in CI/CD - Bazel would need to run with a flag that forces evaluation of all macros and finalizers in the package.

By contrast, in situations where users need low latency of loading - for example, in an IDE when opening a BUILD file - one would certainly want to make evaluation lazy (and therefore not evaluate finalizers) unless a target under the finalizer's namespace was requested.

@tetromino tetromino added type: feature request P1 I'll work on this now. (Assignee required) team-Loading-API BUILD file and macro processing: labels, package(), visibility, glob labels Jul 30, 2024
@tetromino tetromino self-assigned this Jul 30, 2024
@matts1
Copy link
Contributor

matts1 commented Aug 1, 2024

This seems like a great idea, but more commonly than not, these "finalizers" don't actually run at package level granularity. They run on whole trees of files.

For example, I've previously worked on a project where we required every build file to have a conformance_checks() rule at the bottom of a file. The problem was, this was kind of a pain to put at the bottom of every build file - It was very easily forgotten, and very frequently would result in having to re-upload several minutes later after your presubmits failed.

I'd propose that finalizers may optionally be transitive. That is, if //foo:my_finalizer is a transitive finalizer, native.existing_rules(transitive=True) would return //foo/bar/baz:* in addition to //foo:*. This means that you could have a single finalizer for your whole project, instead of having to specify it in every single build file.

The only concern I have with such an approach would be the potential longer loading phase time increase when loading top-level build files, but I think that in practice, most of the time when you load top-level build files, they depend on the whole tree under them anyway.

Also, side question: How willl finalizers interact with export_files and non-exported files?

@tetromino
Copy link
Contributor Author

@matts1 - the idea is to run finalizers in the loading phase. Loading phase means each package is in a separate Starlark (and Java) thread, and the packages cannot see each others' content. See https://bazel.build/extending/concepts#evaluation-model

But what we could do instead is register default finalizers.

  • They would need to be registered in a file which is evaluated before any BUILD files. The most logical location would be REPO.bazel (or WORKSPACE for legacy repos). (We wouldn't want to use MODULE.bazel for this because the existence of default finalizers in a repo is logically an implementation detail, not a part of a module's public-facing api.)
  • The registration would apply only to packages which satisfy a package_group-like specification (for example, you may only want the android team's finalizers under the directory tree they own, you wouldn't want any default finalizers at all in //experimental, etc.)

This default finalizer registration a part of my proposal internally, but we decided to shelve it for now because of worries about performance.

@matts1
Copy link
Contributor

matts1 commented Aug 6, 2024

That sounds like an interesting idea.

What were the specific performance concerns? Was it hundreds of matchers which you have to match every package against being a problem in a large monorepo?

I would have imagined performance concerns could be mitigated with sub-matchers, but maybe I'm misunderstanding the problem

@tetromino
Copy link
Contributor Author

tetromino commented Aug 6, 2024

@matts1 - the performance problem is that in a monorepo, a change to REPO.bazel or WORKSPACE invalidates all loading/analysis caches company-wide, including the CI/CD systems :)

For the non-monorepo case, default registration could probably be useful - but were weren't sure how useful, how many people would be interested.

@matts1
Copy link
Contributor

matts1 commented Aug 7, 2024

Could that potentially be solved by moving the finalizers outside of REPO.bazel, with REPO.bazel only referencing them? This seems like a nice solution, but I don't know how feasible interleaving the loading and analysis phase would be, which might be required for this. That being said, my design doc regular rules in module extensions was approved, and does require solving that problem.

# REPO.bazel
register_default_finalizers("//:default_finalizers")
# //BUILD
default_finalizer_group(
    name = "default_finalizers",
    finalizers = [
        "//foo/bar:default_finalizer",
    ]
)
# //foo/bar/BUILD
finalizer_rule(
    name = "default_finalizer",
    default_finalizer = True,
    ...
)

@tetromino
Copy link
Contributor Author

tetromino commented Aug 7, 2024

@matts1 - if package //foo can theoretically get a default finalizer registered in by a specification defined in <bar>, then a change to bar must invalidate the loading and analysis cache for //foo. It does not matter how <bar> is structured - whether it's a single file or a group of files, whether it's REPO.bazel or some .bzl file referenced by REPO.bazel, or whatever other mechanism. The cache will be invalidated. And if all packages in a repo can theoretically get a default finalizer registered in them by a specification defined in <bar>, then any change to bar will invalidate all caches for that repo.

The solution for big monorepos might be along the lines of sharding the repo into components or projects which have independent config files. It will take some designing.

@matts1
Copy link
Contributor

matts1 commented Aug 7, 2024

Ah, didn't realize that. Adding to your sharding idea, we could do something like the following, where we only allow a finalizer to come from parent packages:

# REPO.bazel
register_default_finalizers("//:default_finalizers")
# //BUILD
default_finalizer_group(
    name = "default_finalizers",
    finalizers = [
        "//foo:default_finalizers",
        # We *may* want to disallow this, to ensure that only changes to the direct parent c
    ]
)
# //foo/BUILD
default_finalizer_group(
    name = "default_finalizers",
    finalizers = [
        "//foo/bar:default_finalizers",
       # This would be disallowed because //baz is not a subpackage of //foo.
        "//baz:default_finalizers",
    ]
)
# //foo/bar/BUILD
finalizer_rule(
    name = "default_finalizer",
    default_finalizer = True,
    ...
)

Though thinking about it now, this seems very similar to my original proposal, with the only real distinction being that if you don't configure finalizers, the parent BUILD files don't invalidate the child ones.

@matts1
Copy link
Contributor

matts1 commented Aug 7, 2024

I might be misunderstanding how things work, but if, for example, you had a global configuration file:

register_default_finalizer("//foo/bar/...", "foobar_finalizer")
register_default_finalizer("//foo/...", "foo_finalizer")

Then it could evaluate to:

{
   SkyKey("all_finalizers"): SkyValue(["//foo", "//foobar"])
   SkyKey("default_finalizer://foo/bar"): SkyValue("foobar_finalizer"),
   SkyKey("default_finalizer://foo"): SkyValue("foo_finalizer")
}

Then the SkyFunction for the package "//foo/bar/baz" would take in the SkyValues for the default finalizers for ["//", "//foo", "//foo/bar", //foo/bar/baz"]. This seems like it'd prevent a cache invalidation without needing to shard, since the SkyValues for those things wouldn't change unless they'd actually been changed in the config file, instead of if they could have theoretically been changed.

That being said, this might need a bit of a rework to skyframe itself to allow you to map SkyKey(default_finalizer://unknown_key) to SkyValue(None) by checking if it's contained within "all_finalizers". I don't have enough experience with skyframe to know whether this proposal is viable or not.

copybara-service bot pushed a commit that referenced this issue Aug 17, 2024
…yard.

It has been enabled by default since Bazel 6.0 (#14717).

Removing the flag simplifies native.existing_rules() behavior considerably,
making it easier to add new finalizer macro semantics. Working towards
#23160

PiperOrigin-RevId: 663963764
Change-Id: I82308093cbf2f4befb53cd4ea19dfc75e14e19af
copybara-service bot pushed a commit that referenced this issue Aug 27, 2024
We wish to add rule finalizers: special symbolic macros, which - regardless of
lexical position in a BUILD file - are evaluated in the final stage of loading
a package, after all non-finalizer targets have been defined. Unlike ordinary
symbolic macros, a finalizer can call `native.existing_rules()`, where it
behaves slightly differently than in legacy macros: it only returns the set of
non-finalizer rule targets. The finalizer may assert on the state of that set
or define new targets.

Working towards #23160

PiperOrigin-RevId: 668108498
Change-Id: Idba9f0f446edc8ceb27de3d84b57eb92353d66c0
copybara-service bot pushed a commit that referenced this issue Sep 25, 2024
Working towards #23160

PiperOrigin-RevId: 678773063
Change-Id: Iea8f319e8c6dae9a735420707f0d0a799df6fabd
@brandjon
Copy link
Member

brandjon commented Oct 2, 2024

@tetromino, should this bug be closed now that finalizers (as we scoped them for Symbolic Macros) is implemented and set for release with 8.0? Or is there ongoing design work or extensions you want to use this bug for?

@tetromino
Copy link
Contributor Author

@brandjon - thanks for the reminder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Loading-API BUILD file and macro processing: labels, package(), visibility, glob type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants