Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the buildtagger tool #187

Merged
merged 9 commits into from
Mar 7, 2024
Merged

Add the buildtagger tool #187

merged 9 commits into from
Mar 7, 2024

Conversation

ulucinar
Copy link
Contributor

@ulucinar ulucinar commented Mar 4, 2024

Description of your changes

buildtagger can be used to generate the desired build tags (constraints) for the official provider families. Each resource provider's source modules can share a unique build tag so that the tag-aware Go tools (including golangci-lint linter runner) can load only those source modules belonging to that resource provider, which results in reduced memory consumption.

This tool can add the desired build tags to the specified Go source files. Sample invocations of the tool are as follows:

  • buildtagger --parent-dir ./apis --regex "(.+)/.+/.+\.go" --tag-format "(%s || all) && !ignore_autogenerated" --mode dir
    This will run the tagger on the apis folder and its sub-folders recursively, matching relative paths against the regular expression supplied with the --regex argument. The directory structure under the apis folder for the upjet-based providers is as follows:
apis
├── accessanalyzer
│   └── v1beta1
│       ├── zz_analyzer_terraformed.go
│       ├── zz_analyzer_types.go
│       ├── zz_archiverule_terraformed.go
│       ├── zz_archiverule_types.go
│       ├── zz_generated.conversion_hubs.go
│       ├── zz_generated.deepcopy.go
│       ├── zz_generated.managed.go
│       ├── zz_generated.managedlist.go
│       └── zz_groupversion_info.go
├── account
│   └── v1beta1
│       ├── zz_alternatecontact_terraformed.go
│       ├── zz_alternatecontact_types.go
│       ├── zz_generated.conversion_hubs.go
│       ├── zz_generated.deepcopy.go
│       ├── zz_generated.managed.go
│       ├── zz_generated.managedlist.go
│       └── zz_groupversion_info.go
...

So this command matches the relative path (--mode set to dir) wrt to the parent path ./apis of every source file under an API group, capturing the API groups of them (such as accessanalyzer or account) with a regular expression group. This capturing group is then substituted into format string specified with the --tag-format argument, e.g., (%s || all) && !ignore_autogenerated. So the generated build constraint for the source file account/v1beta1/zz_alternatecontact_terraformed.go is //go:build (account || all) && !ignore_autogenerated.

This allows the linter runner to only analyze the source files with the account build constraint when linting the account resource provider (i.e., the provider upbound/provider-aws-account) when it's run with the account build tag. Similarly, if the linter runner analyze cache is already populated, and thus, the analysis phase of the linter runner will be cheap (in terms of compute resources), one can just run the linter runner with the all build tag to lint all the source files (by convention we tag all the source files with the all tag). The !ignore_autogenerated constraint is inherited from the controller-gen tool.

  • buildtagger --parent-dir ./internal/controller --regex "zz_(.+)_setup\.go" --tag-format "(%s || all) && !ignore_autogenerated" --mode file
    This invocation is very similar to the above one, it just uses the file mode instead of the default dir mode. In the file mode, the specified regular expression with the --regex parameter is matched against the base filename instead of the relative path of the discovered file wrt to the specified parent directory. Rest of the semantics are exactly the same with the above example with the dir mode. This mode can be used to capture the API group name (the resource provider name) from the names of the generated source files instead of the directory names. For example, upjet generates a setup file for each API group to setup the controller manager with all the reconcilers under that API group:
internal/controller
├── zz_accessanalyzer_setup.go
├── zz_account_setup.go
...

For these files, we need to capture the resource provider name (the API group) from the file name. This is where the file mode can be used.

  • buildtagger --parent-dir ./internal/controller/eks/clusterauth/controller.go --tag-format "eks || all" --mode file
    This another usage example where the specified regex does not define any capturing groups and the specified tag format does not have any format specifiers. This can be used to tag manually added (non-generated) files. internal/controller/eks/clusterauth/controller.go is a manually maintained file for the ClusterAuth.eks resource and we tag it with the eks constraint so that it's linted as part of the resource provider upbound/provider-aws-eks.

Here's a formal description of the command-line arguments for the tool:

❯ buildtagger --help
usage: buildtagger [<flags>]

A tool for generating build tags (constraints) for the source modules of the official provider families.

Flags:
  --help             Show context-sensitive help (also try --help-long and --help-man).
  --parent-dir="./"  Parent directory which will be recursively walked to find the Go source files whose relative path to this parent matches the specified regular expression. The files found will be tagged using the specified build tag.
  --regex=".*"       The regular expression against which a discovered Go source file's relative path or name will be matched. This expression must contain one and only one group whose value will be substituted in the given tag format string. An example is "(.+)/.+/.+\.go"
  --tag-format="!ignore_autogenerated"
                     A Printf format string to construct the build tag. An example is "(%s || all) && !ignore_autogenerated", where the "%s" format specifier can be replaced by a family resource provider group name.There should be a string format specifier for each of the capturing groups specified in
                     the "regex".
  --mode=dir         If "file", the file name of the discovered Go source is matched against the given regular expression. If "dir", the relative path of the source file is matched.
  --delete           If set, the build tags are removed from the discovered Go sources, instead of being added.

As explained above, the number of format specifiers in the --tag-format option must match the number of capturing groups in the --regex option, and they must all be %s string specifiers.

Go version bump

This PR also bumps the Go module version and the Go version used in the repo's build pipelines to v1.21.

Alternatives Considered

We've also considered generating these build constraints for the family providers via upjet, which is one of the underlying code generation frameworks that we use to generate the official providers including upbound/provider-aws. We did not choose this approach because of the following reasons:

  • Not all upjet-based providers will need this. Currently, linter runner is only an issue with our official AWS provider and we currently don't need this for the other providers. This means that if we implement tagging in upjet, we will also want to implement a configuration switch to enable/disable the generation of build constraints. This is additional complexity for upjet's code generation pipelines.
    • Upjet is not only for family providers, i.e., most of the providers generated with upjet are not families (even the official crossplane-contrib/provider-upjet-azuread is a monolith, only 3 families exist as of now). But the idea we are implementing for upbound/provider-aws is reducing the compute resources required by the linter runner by running the linters on each API group (resource provider like ec2 or acm) in isolation. For non-family providers, this idea is inherently more complex to implement across different API groups because there are no logical boundaries that we build this solution on in non-family providers. Adding this capability that we will use only for a family provider to upjet seems like introducing unnecessary complexity there. Having this capability implemented as a separate tool is thus better for managing complexity.
  • Even if we implement this as part of upjet, upjet is not the only code generation tool we are using in the upjet-based provider repositories. We also rely on angryjet and controller-gen in those repos. We would also need these tools to be able to generate the build constraints for the resource providers and handle them. A standalone tool like buildtagger can implement this feature regardless of the underlying code generators we are using, i.e., this is a cross-cutting concern for us across different code generation frameworks.
  • Currently we are having memory issues only with the linter runner's analysis phase and not the Go build itself. So currently, we don't need to use the build constraints for building the source code, we need it only for linting it. Implementing this in upjet means that we would also need to change our build pipelines to handle these build constraints (unless we make the generation of the build tags configurable in upjet and we do two passes of generation one for Go build without the build tags and the other for the linter with build tags). So when we implement this in a separate tool, we can just add an extra step in the linter pipeline to invoke this tool to generate the build tags. Rest (including the other CI pipelines like the build pipeline) stays the same, without any unnecessary modifications to them because we have introduced the build constraints to the linter pipeline.

I have:

  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

This tool has been tested with the official AWS provider in this test PR.

- buildtagger can be used to generate the desired build tags (constraints)
  for the official provider families.
- Each resource provider's source modules can share a unique build
  tag so that tag-aware Go tools (including golangci-lint) can
  load only those source modules.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
… exit non-zero

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…w to

conditionally skip the "lint" job.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…exity

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Copy link
Member

@sergenyalcin sergenyalcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulucinar LGTM!

@ulucinar ulucinar merged commit 0826d2a into upbound:main Mar 7, 2024
6 checks passed
@ulucinar ulucinar deleted the tagger branch March 7, 2024 14:54
ulucinar added a commit that referenced this pull request Mar 12, 2024
[Backport standard-runners]: Backports PRs #187 #188 #189 #190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants