-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(coverage): Register coverage.py to hermetic toolchains #977
Conversation
This is mostly building on the support added in bazelbuild/bazel@9d01630 |
a0a0700
to
8550b00
Compare
python/private/coverage_deps.bzl
Outdated
http_archive, | ||
name = name, | ||
build_file_content = """ | ||
py_library( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this can be a py_library
. Doesn't that lead to a dependency when you add it as a dependency for the python_runtime
? Or is the python toolchain not actually a dependency of py_library
targets (as opposed to py_binary
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coverage_tool
needs to be a single file from the toolchain point of view and that is why there is only a single src
in there. I was following official docs from https://bazel.build/configure/coverage#python and since the manual tests with bazel coverage
worked in some cases, I thought that was fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so! I must have been thinking about the prohibition on using py_binary
there (because py_binary
does depend on the python toolchain to assemble the wrapper script, and that of course depends on the coverage tool).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, thanks for explaining. Should we have an extra line of docs somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not using py_library for now -- it implies a set of functionality that simply isn't respected (e.g. imports attribute won't be respected, anything in its PyInfo is ignored, etc).
It's currently possible to put a py_library there, but is a bit risky. This is because py_library doesn't, today, depend on the toolchain. However, I don't see how it can avoid that forever (in particular, I'm thinking of for when pyc precompilation is added).
I do think a py_library-like-equivalent dependency is the correct conceptual type of dependency that should be there, though. Running with code coverage is, essentially, adding an extra dependency to the binary and wrapping the regular entry point in another entry point. Unfortunately, it's not immediately clear to me how to do make that work cleanly via (or with support from) the toolchain (I have a few ideas, but haven't thought them through).
To use the coverage settings of the toolchain, you need to:
- Point coverage_tool to a single file. Only this file is consumed. Any additional files, runfiles, dependencies, etc aren't used.
- Put any files coverage_tool needs in coverage_files, which will be added to the binary's runfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean filegroup
is clearly documented that srcs
is in the files output for the rule (which we need) and data
is in the runfiles
. I don't think we need to make it more complicated than that (it is also true that bazel will treat such a filegroup
as an executable target, however we don't think we actually depend on that behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, sorry, I mixed up the py_runtime and PyRuntimeInfo apis. Disregard the "you have to specify two attributes" part. The splitting of the target into (entry point, files) is handled by py_runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, +1 to filegroup.
I don't think we need to make it more complicated than that
I think you're referring to my "it should be a py library-like-equivalent dependency" comment?
There's a big distinction between "bundle of pre-built, platform-specific files" and a "dependency" (for lack of a better term). The former carries no information other than "here's a bag of files". A dependency can carry additional information, such as modifications to make to the import path, additional linker flags, propagating a proper runfiles object (we currently omit symlinks, for example), or other necessary metadata.
But, lets not derail this PR with an extended discussion of that? Happy to discuss in more detail on slack or in an issue
it is also true that bazel will treat such a filegroup as an executable target, however we don't think we actually depend on that behavior
"depend" is a strong word, but we do allow the "executable" of the target to be an indicator as to what the entry point is. The basic logic for converting the Target to the (entry point, files) is:
- If there is one output, use that
- otherwise, use whatever the executable file is
- Otherwise, error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make it more complicated than that
I think you're referring to my "it should be a py library-like-equivalent dependency" comment?
There's a big distinction between "bundle of pre-built, platform-specific files" and a "dependency" (for lack of a better term). The former carries no information other than "here's a bag of files". A dependency can carry additional information, such as modifications to make to the import path, additional linker flags, propagating a proper runfiles object (we currently omit symlinks, for example), or other necessary metadata.
Yes, and in this particular case, so long as the target has its visibility restricted to the toolchain, a bag of pre-built platform-specific files is exactly what this is - the py_runtime
rule won't pay any attention to additional providers from the target, and if it did honor e.g. imports
from this input, I can't see a situation where that wouldn't do more harm than good. But, this is a very good reason to make sure the visibility is appropriately restricted.
"depend" is a strong word, but we do allow the "executable" of the target to be an indicator as to what the entry point is. The basic logic for converting the Target to the (entry point, files) is:
- If there is one output, use that
- otherwise, use whatever the executable file is
- Otherwise, error
Yes, I do remember how I implemented that. But in this case I was referring to the fact that a filegroup
with a single file in src
will have a FilesToRun
provider supplying that file as an executable, which would make the second option you list above there work, but we don't depend on that behavior (which is, admittedly, a little weird) because it'll stop at the first in this case.
At any rate, if a filegroup
is what we want to use here, then the upstream docs in the bazel repo should maybe be updated? Except that there might be use cases for having it as a py_library
in the general case; just in this particular case we don't need it to be and by using restricted visibility and a filegroup
we can afford to not think about all of the ways that could go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to use filegroup, so it should be hopefully an improvement over py_library
and we can figure out something better as a separate PR.
This is awesome! I was going to work on it this week. I did a pass on your code so far and will do a more thorough review today. |
python/private/coverage_deps.bzl
Outdated
py_library( | ||
name = "coverage", | ||
srcs = ["coverage/__main__.py"], | ||
data = glob(["coverage/*", "coverage/**/*.py", "coverage/*.so"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moot after removing py_library, but:
.py
files don't belong in data. Instead, a separate py_library should be factored out with proper srcs- The
"coverage/*"
glob is probably too general and overlaps with "*.so". In general, a"*"
or"**"
glob should only be done when you need every file in the source tree (e.g. some sort of source distribution zip), or if you have a data directory and you know it only contains files that should be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please take a look again.
python/private/coverage_deps.bzl
Outdated
srcs = ["coverage/__main__.py"], | ||
data = glob(["coverage/*", "coverage/**/*.py", "coverage/*.so"]), | ||
exec_compatible_with = {compatible_with}, | ||
visibility = ["//visibility:public"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be public visibility? It's really only the toolchain that should be referencing it. It'd be rather hard to use it correctly because its just a group of platform-specific files.
python/repositories.bzl
Outdated
config_setting( | ||
name = "coverage_enabled", | ||
values = {{"collect_code_coverage": "true"}}, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh awesome, thank you for adding this!
Can you restrict visibility of this? It's not the py rule's responsibility to tell if coverage is enabled or not, so lets not expose a way for people to use it and think it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/repositories.bzl
Outdated
@@ -255,13 +267,15 @@ py_runtime_pair( | |||
python_path = python_bin, | |||
python_version = python_short_version, | |||
python_version_nodot = python_short_version.replace(".", ""), | |||
coverage_tool = rctx.attr.coverage_tool if rctx.attr.coverage_tool == None and "windows" not in rctx.os.name else "\"{}\"".format(rctx.attr.coverage_tool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic has a bug?
If tool=None and os=windows, then it'll do format(None), resulting in "None"
, which isn't a valid value.
Is this supposed to be "or" instead of "and" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, there was a bug, fix it now.
python/private/coverage_deps.bzl
Outdated
http_archive, | ||
name = name, | ||
build_file_content = """ | ||
py_library( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, sorry, I mixed up the py_runtime and PyRuntimeInfo apis. Disregard the "you have to specify two attributes" part. The splitting of the target into (entry point, files) is handled by py_runtime.
@@ -1 +1,3 @@ | |||
common --experimental_enable_bzlmod | |||
|
|||
coverage --java_runtime_version=remotejdk_11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess would be because they don't have java installed on their dev machine, or at least not in a way that bazel is able to find it, and including it in the PR was accidental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove it? I installed bazel
via bazelisk
and it does not pull in any java
dependencies by default. What is the official way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So according to this: https://bazel.build/docs/bazel-and-java#hermetic-testing
What this basically means is instead of using a locally installed JDK, it uses one it downloads (it sounds similar to how we're doing things).
So this is probably a net positive for getting started at the cost of having to download a JDK. I think that's worth it, so go ahead and leave it in.
|
||
# Update with './tools/update_coverage_deps.py <version>' | ||
#START: managed by update_coverage_deps.py script | ||
_coverage_deps = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hm. Something I just realized is we run the risk of conflicting with a user installed coverage library. e.g.
the toolchain uses coverage X
The user, in their requirements, uses coverage Y
At runtime, we'll have 2 different coverage libraries available, and it's not well defined which will be used.
I'm not sure what we can do about this, though. I guess add an arg to toolchain creation and dependency setup that allows the user to specify the coverage target to use, and skip having one automatically installed? Or...IDK. The key attribute we're looking for is that the toolchain and user binary reference the same target (or equiv; that we only end up with 1 coverage library in runfiles).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an argument in favor of using py_library
as the target type in the remote for coverage_dep
, so things other than the toolchain could use it.
I don't think there's a way around having the coverage tool used by the toolchain being potentially different from what's declared in requirements, because you can't actually do a pip install
until the toolchain is configured. I don't think it's a dealbreaker, though; it might be a little surprising to find out that coverage is running with a version of coveragepy
distinct from what was in your requirements.txt
, but at the same time I don't think it'll actually interefere with anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, can you default register_coverage_deps=False
? Otherwise we'll potentially cause issues with people who have coverage in their deps already, and we don't really have any solution or workaround available to them.
Yeah, we were kicking around ideas on how to best address this. I'm not sure there is much to be done for this in this PR. The only thing I can think of would be to change the register_coverage_tool
arg from a bool to something that allows the caller to pass in their own target.
you can't actually do a pip install until the toolchain is configured.
Hmmm, well, what if you could? If we had a separate "pip tools" or "requirements resolution" toolchain, then maybe we'd be able to do that. Just thinking out loud.
The two more appealing ideas we came up with were:
- Making sure the toolchains' coverage was last in the import order (probably by modifying the bootstrap, when the
bootstrap_template
arg becomes available). That would at least give well-defined order for which version gets used and lets the user have more control over the version used. - Have the toolchain only have the coverarge tool entry point, not the coverage library itself; each test is responsible for bringing the necessary deps for the coverage tool (and try to figure out how to make that as easy/transparent to test writers as possible)
|
||
@unittest.skipIf(%is_windows%, "not supported on Windows") | ||
def test_coverage_toolchain(self): | ||
subprocess.run("bazel coverage //...", shell=True, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another way to test this? I was actually looking to remove this run_acceptance_test thing because it prevents upstream bazel from running our tests with upcoming bazel versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that. I was using what was available, but if you have a suggestion on how to configure bazelci
in order to run coverage, I would be happy to remove the changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just have to add coverage_targets: <targets>
to the presubmit.yml?
But eh, I'm ok with doing that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted modifications to this file and relying on the coverage_targets in the examples for now.
@@ -53,6 +53,15 @@ class TestPythonVersion(unittest.TestCase): | |||
|
|||
subprocess.run("bazel test //...", shell=True, check=True) | |||
|
|||
stream = os.popen("bazel info execution_root") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the subprocess module instead. Right now, this is leaking an opened file description, which spams the logs with a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
@unittest.skipIf(%is_windows%, "not supported on Windows") | ||
def test_coverage_toolchain(self): | ||
subprocess.run("bazel coverage //...", shell=True, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just have to add coverage_targets: <targets>
to the presubmit.yml?
But eh, I'm ok with doing that in a separate PR.
- Only set the tooling if it is actually needed. - By default it is set to None.
The user can turn the coverage_tool setting by passing `register_coverage_tool=(True|False)` to `python_register_toolchains` or `python_register_multi_toolchains` call or specifying the `coverage_tool` label as described in the `versions.bzl` file. Update tests to: - ensure that we can still use the toolchain as previously. - ensure that we are not downloading extra deps if they are not needed.
This hopefully makes everything more watertight and will help avoid bugs with misconfiguration.
It seems that the support is missing upstream [3]. [3]: bazelbuild/bazel#15835
The `__init__.py` files in the root of the WORKSPACE in `bzlmod` is breaking, when running under `bazel coverage //:test`. However, it started working when I renamed `__init__.py` to `lib.py`. I am suspecting that this has to do with the fact that the layer of indirection that `coverage` introduces could be something to do with that.
The `multi_python_versions` example cannot show coverage for the more complex tests that are using `subprocess`. I am wondering if this is related to the fact that we are including `coverage.py` via the toolchain and not through other mechanisms [2]. [2]: https://bazel.build/configure/coverage
I am not sure how I can translate the config constraint to a build environment here.
The logic should be that we set the `coverage_tool` to None if it is Windows or the tool has been not set.
This changes how we install the coverage dependencies and allows us to set the visibility for the coverage tool correctly by passing the name of the toolchain repo during the repository function execution. bzlmod does not have this due to the fact that the dependency registering is happening in a different way/order.
Now we store everything in a nested dict of tuples, which makes the lookup of supported versions slightly cleaner.
We rely now on the bazelci config instead
btw, re: coverage of subprocesses not working: in the bootstrap, there's some code that unsets some coverage variables. I think keeping that set is necessary to propagate coverage things to subprocesses |
…ld#977) This allows including the coverage package as part of the toolchain dependencies, which is mixed into a test's dependencies when `bazel coverage` is run (if coverage is not enabled, no extra dependency is added) For now, it's disabled by default because enabling it poses the risk of having two versions of coverage installed (one from the toolchain, one from the user's dependencies). The user can turn the coverage_tool setting by passing `register_coverage_tool=(True|False)` to `python_register_toolchains` or `python_register_multi_toolchains` call or specifying the `coverage_tool` label as described in the `versions.bzl` file. Use coverage.py v6.5.0 because the latest has `types.py` in the package directory, which imports from Python's stdlib `types` [1]. Somehow the Python interpreter is thinking that the `from types import FrameType` is referring to the currently interpreted file and everything breaks. I would have expected the package to use absolute imports and only attempt to import from `coverage.types` if we use `coverage.types` and not just a plain `types` import. NOTE: Coverage is only for non-windows platforms. Update tests to: - ensure that we can still use the toolchain as previously. - ensure that we are not downloading extra deps if they are not needed. * Also changes the projects bazelrc to use a remotejdk, which makes it easier for contributors because they don't have to locally install a jdk to get going. [1]: https://github.com/nedbat/coveragepy/blob/master/coverage/types.py [3]: bazelbuild/bazel#15835
PR Checklist
Please check if your PR fulfills the following requirements:
bazel coverage
.PR Type
What kind of change does this PR introduce?
What is the current behavior?
The
coverage_tool
cannot be included via the toolchian.Issue Number: #43
What is the new behavior?
Summary:
coverage_tool
attribute inpython_repository
.coverage.py
when runningpip_install_dependencies
function.use_repo
includes coverage tool, so thatbzlmod
users can use it too.coverage_tool
in the toolchain code.Does not work/bad/ugly:
types.py
in the package directory, which imports from Python's stdlibtypes
. Somehow the Python interpreter is thinking that thefrom types import FrameType
is referring to the currently interpreted file and everything breaks. I would have expected the package to use absolute imports and only attempt to import fromcoverage.types
if we usecoverage.types
and not just a plaintypes
import.multi_python_versions
example cannot show coverage for the more complex tests that are usingsubprocess
. I am wondering if this is related to the fact that we are includingcoverage.py
via the toolchain and not through other mechanisms.__init__.py
files in the root of the WORKSPACE inbzlmod
is breaking, when running underbazel coverage //:test
. However, it started working when I renamed__init__.py
tolib.py
. I am suspecting that this has to do with the fact that the layer of indirection thatcoverage
introduces could be something to do with that.Work towards #43.
Does this PR introduce a breaking change?
We may potentially break setups where coverage tooling is already included via
COVERAGE_PYTHON
variable setting. In that case the users should use the flag todisable coverage_tool inclusion in the toolchain. Setting
register_coverage_tool = False
when registering toolchains would be sufficient.Other information