-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: report unused inputs for the tar rule #951
Open
plobsing
wants to merge
11
commits into
bazel-contrib:main
Choose a base branch
from
plobsing:tar_compute_unused_inputs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+292
−10
Commits on Oct 5, 2024
-
perf: report unused inputs for the tar rule
The `mtree` spec passed to the `tar` rule very often selects a subset of the inputs made available through the `srcs` attribute. In many cases, these subsets do not break down cleanly along dependency-tree lines and there is no simple way just pass less content to the `tar` rule. One prominent example where this occurs is when constructing the tars for OCI image layers. For instance when [building a Python-based container image](https://github.com/bazel-contrib/rules_oci/blob/main/docs/python.md), we might want to split the Python interpreter, third-party dependencies, and application code into their own layers. This is done by [filtering the `mtree_spec`](https://github.com/aspect-build/bazel-examples/blob/85cb2aaf8c6e51d5e9e086cc94b94ab896903fb0/oci_python_image/py_layer.bzl#L39). However, in the operation to construct a `tar` from a subsetted mtree, it is usually still an unsubsetted tree of `srcs` that gets passed. As a result, the subset tarball is considered dependent upon a larger set of sources than is strictly necessary. This over-scoping runs counter to a very common objective associated with breaking up an image into layers - isolating churn to a smaller slice of the application. Because of the spurious relationships established in Bazel's dependency graph, all tars get rebuilt anytime any content in the application gets changed. Tar rebuilds can even be triggered by changes to files that are completely filtered-out from all layers of the container. Redundent creation of archive content is usually not too computationally intensive, but the archives can be quite large in some cases, and avoiding a rebuild might free up gigabytes of disk and/or network bandwidth for better use. In addition, eliminating the spurious dependency edges removes erroneous constraints applied to the build action schedule; these tend to push all Tar-building operations towards the end of a build, even when some archive construction could be scheduled much earlier. ## Risk assessment and mitigation The `unused_inputs_list` mechanism used to report spurious dependency relationships is a bit difficult to use. Reporting an actually-used input as unused can create difficult to diagnose problems down the line. However, the behaviour of the `mtree`-based `tar` rule is sufficiently simple and self-contained that I am fairly confident that this rule's used/unused set can be determined accurately in a maintainable fashion. Out of an abundance of caution I have gated this feature behind a default-off flag. The `tar` rule will continue to operate as it had before - typically over-reporting dependencies - unless the `--@aspect_bazel_lib//lib:tar_compute_unused_inputs` flag is passed. ### Filter accuracy The `vis` encoding used by the `mtree` format to resiliently handle path names has a small amount of "play" to it - it is reversable but the encoded representation of a string is not unique. Two unequal encoded strings might decode to the same value; this can happen when at least one of the encoded strings contains unnecessary escapes that are nevertheless honoured by the decoder. The unused-inputs set is determined using a filter that compares `vis`-encoded strings. In the presence of non-canonically-encoded paths, false-mismatches can lead to falsely reporting that an input is unused. The only `vis`-encoded path content that is under the control of callers is the `mtree` content itself; all other `vis`-encoded strings are constructed internally to this package, not exposed publicly, and are all derived using the `lib/private/tar.bzl%_vis_encode` function; all of these paths are expected to compare exactly. Additionally, it is expected that many/most users will use this package's helpers (e.g. `mtree_spec`) when crafting their mtree content; such content is also safe. It is only when the user crafts their own mtree, or modifies an mtree spec's `content=` fields' encoding in some way, that a risk of inaccurate reporting arises. The chances for this are expected to be minor since this seems like an inconvenient and not-particularly-useful thing for a user to go out of their way to do.
Configuration menu - View commit details
-
Copy full SHA for 799e1e7 - Browse repository at this point
Copy the full SHA 799e1e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 359512a - Browse repository at this point
Copy the full SHA 359512aView commit details -
Add tri-state attribute to control unused-inputs behaviour
This control surface provides for granular control of the feature. The interface is selected to mirror the common behaviour of `stamp` attributes.
Configuration menu - View commit details
-
Copy full SHA for f1492cf - Browse repository at this point
Copy the full SHA f1492cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for a2d0a85 - Browse repository at this point
Copy the full SHA a2d0a85View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5916145 - Browse repository at this point
Copy the full SHA 5916145View commit details -
Configuration menu - View commit details
-
Copy full SHA for 916ee9f - Browse repository at this point
Copy the full SHA 916ee9fView commit details -
Configuration menu - View commit details
-
Copy full SHA for e8f81be - Browse repository at this point
Copy the full SHA e8f81beView commit details -
Configuration menu - View commit details
-
Copy full SHA for d60b03c - Browse repository at this point
Copy the full SHA d60b03cView commit details
Commits on Oct 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6d72717 - Browse repository at this point
Copy the full SHA 6d72717View commit details -
Support alternate contents= form
This is accepted by bsdtar/libarchive. In fact `contents=` is the only of the pair documented in `mtree(5)`; `content=` is an undocumented alternate form supported by libarchive.
Configuration menu - View commit details
-
Copy full SHA for c859573 - Browse repository at this point
Copy the full SHA c859573View commit details -
Don't try to prune the unprunable
Bazel's interpretation of unused_inputs_list cannot accomodate certain things in filenames. These are also likely to mess up our own line-oriented protocol in the shellscript that produces this file.
Configuration menu - View commit details
-
Copy full SHA for 515377c - Browse repository at this point
Copy the full SHA 515377cView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.