Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customized coveragerc for Python coverage #1434

Open
tingilee opened this issue Sep 21, 2023 · 8 comments · May be fixed by #2246
Open

Allow customized coveragerc for Python coverage #1434

tingilee opened this issue Sep 21, 2023 · 8 comments · May be fixed by #2246

Comments

@tingilee
Copy link

Description of the feature request:

For python coverage collection, .coveragerc is hardcoded in https://github.com/bazelbuild/bazel/blame/d435c6dd6e977a5c3ea1bc726557a9321948a317/tools/python/python_bootstrap_template.txt#L393-L397. To provide configurability, we'd like to introduce custom .coveragerc to set configurations including https://coverage.readthedocs.io/en/stable/excluding.html#advanced-exclusion.

Which category does this issue belong to?

Configurability, Python Rules

What underlying problem are you trying to solve with this feature?

Allow coverage configurations. In particular, we're interested in excluding import statements being collected as executable lines in coverage reports.

Which operating system are you running Bazel on?

macOS, Linux

What is the output of bazel info release?

release 6.3.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@rickeylev
Copy link
Contributor

It seems like a reasonable feature request. Several questions, though:

  • How does the binary get the path to the config file? The PR uses an environment variable, but who is setting that and how? I'm not sure an environment variable is the right approach.
  • How do other languages plumb through coverage settings?
  • Do coverage settings need to be per-test, or does a single build-wide config suffice? The difference is a build flag vs an attribute on the target.

All this said, this will probably work out better if we add this in rules_python instead of Bazel itself. I'll transfer this issue over to the rules_python repo.

@rickeylev rickeylev transferred this issue from bazelbuild/bazel Sep 27, 2023
@tingilee
Copy link
Author

How does the binary get the path to the config file?

In general, I'd like to propose that we set environment variable in rules_python to allow overrides:

coverage_env["PYTHON_COVERAGERC"] = ":coveragerc_file"

this matches how the coverage binary is defined, for example:

coverage_env["PYTHON_COVERAGE"] = "bazel_python_internal_pytest_pip_deps_coverage/pypi__coverage/coverage"

How do other languages plumb through coverage settings?

I have not seen coverage setting in other languages, ie. cc or Go. This seems to be pretty specific to Python or implementation of coveragepy.

Do coverage settings need to be per-test, or does a single build-wide config suffice?

.coveragerc configuration should be customized for every py_test target. This would allow us to choose/customize coverage collection per test invocation

Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Mar 26, 2024
@raylu
Copy link
Contributor

raylu commented Mar 26, 2024

Do coverage settings need to be per-test, or does a single build-wide config suffice?

I would definitely settle for a single, build-wide config

@github-actions github-actions bot removed the Can Close? Will close in 30 days if there is no new activity label Mar 27, 2024
@ewianda
Copy link
Contributor

ewianda commented Apr 12, 2024

This will be very useful. I am trying to migrate from pytest-cov and the coverage runs are almost 4X slower. From the look of it, it seems the time is spent on generating coverage report for third party code. Currently I am using the old PYTHON_COVERAGE tool to configure coverage e.g

"""Coverage entry point."""
import argparse
import os
import re
import sys

from coverage.cmdline import main

if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.path.append(sys.path.pop(0))
    parser = argparse.ArgumentParser()
    for idx, arg in enumerate(sys.argv.copy()):
        if arg.endswith("pytest_helper.py"):
            parser.add_argument("--package", help="Package containing test script")
            args, unknown = parser.parse_known_args()
            sys.argv.insert(
                idx, "--omit='*/external/*,*/py_deps*/*''"
            )
            sys.argv.insert(idx, f"--source={args.package.split(os.path.sep)[0]}")
    sys.exit(main())

@alexeagle
Copy link
Collaborator

I need this for a client who runs into an error with some third-party library, which creates a file in /tmp that ends up in the coverage database file, but the lcov reporter then can't locate it and errors.

https://stackoverflow.com/questions/76807257/python-coverage-report-error-no-source-for-code-path-remote-module-non
the fix is to add a line to the coveragerc file:
https://coverage.readthedocs.io/en/7.5.3/config.html#report-ignore-errors
so this feature would unblock that.

@rickeylev
Copy link
Contributor

error from /tmp file

Yeah, I ran into similar, too, when rewriting the bootstrap code (coverage doesn't like non-seekable files like /dev/fd/N).

I think having it ignores errors makes sense. That way you get some coverage results, even if it's incomplete. I don't think there's much a user can do about a transient source file that got instrumented (writing temp files of code-under-test and running them in-process seems pretty reasonable).

The alternative is to add paths to the omit setting. That worked for /dev/fd/* paths. We could add /tmp, too, I suppose, but that feels problematic.

github-merge-queue bot pushed a commit that referenced this issue Jun 2, 2024
This is a pretty major, but surprisingly not that invasive, overhaul of
how binaries
are started. It fixes several issues and lays ground work for future
improvements.

In brief:

* A system Python is no longer needed to perform bootstrapping.
* Errors due to `PYTHONPATH` exceeding environment variable size limits
is no
  longer an issue.
* Coverage integration is now cleaner and more direct.
* The zipapp `__main__.py` entry point generation is separate from the
Bazel
  binary bootstrap generation.
* Self-executable zips now have actual bootstrap logic.

The way all of this is accomplished is using a two stage bootstrap
process. The first
stage is responsible for locating the interpreter, and the second stage
is responsible
for configuring the runtime environment (e.g. import paths). This allows
the first
stage to be relatively simple (basically find a file in runfiles), so
implementing it
in cross-platform shell is feasible. The second stage, because it's
running under the
desired interpreter, can then do things like setting up import paths,
and use the
`runpy` module to call the program's real main.

This also fixes the issue of long `PYTHONPATH` environment variables
causing an error.
Instead of passing the import paths using an environment variable, they
are embedded
into the second stage bootstrap, which can then add them to sys.path.

This also switches from running coverage as a subprocess to using its
APIs directly.
This is possible because of the second stage bootstrap, which can rely
on
`import coverage` occurring in the correct environment.

This new bootstrap method is disabled by default. It can be enabled by
setting
`--@rules_python//python/config_settings:bootstrap_impl=two_stage`. Once
the new APIs
are released, a subsequent release will make it the default. This is to
allow easier
upgrades for people defining their own toolchains.

The two-stage bootstrap ignores errors during lcov report generation,
which
partially addresses
#1434

Fixes #691

* Also fixes some doc cross references.
* Also fixes the autodetecting toolchain and directs our alias to it
github-merge-queue bot pushed a commit that referenced this issue Aug 22, 2024
This PR will reduce the time it take to run `bazel coverage` 

*Before this fix,*

 ```
 bazel coverage  --cache_test_results=no  //...
 Elapsed time: 7.054s, Critical Path: 6.87s
 
 ```
 
[lcov --list
bazel-out/_coverage/_coverage_report.dat](https://github.com/user-attachments/files/16681828/lcov.log)


*After*
```
 bazel coverage  --cache_test_results=no  //...
Elapsed time: 2.474s, Critical Path: 1.18s


$ lcov --list bazel-out/_coverage/_coverage_report.dat 
Reading tracefile bazel-out/_coverage/_coverage_report.dat
                                          |Lines       |Functions  |Branches    
Filename                                  |Rate     Num|Rate    Num|Rate     Num
================================================================================
[/home/ewianda/.cache/bazel/_bazel_ewianda/da4b4cc49e0e621570c9e24d6f1eab95/execroot/_main/bazel-out/k8-fastbuild/bin/benchsci/ml/nlp/]
test_tokenizer_stage2_bootstrap.py        | 6.0%    250|    -     0|    -      0

[benchsci/]
devtools/python/pytest_helper.py          |90.5%     21|    -     0|    -      0
ml/nlp/test_tokenizer.py                  | 100%     13|    -     0|    -      0
ml/nlp/tokenizer.py                       |61.8%     76|    -     0|    -      0
================================================================================
                                    Total:|26.1%    360|    -     0|    -      0

```

Related to #1434

---------

Co-authored-by: aignas <240938+aignas@users.noreply.github.com>
@ewianda
Copy link
Contributor

ewianda commented Aug 26, 2024

Even after #2136 , there is a need to better configure coverage-py, for example to collect coverage for a multithreaded program uses something other than the build in threading, one needs to configure

concurrency =
    greenlet
    thread

I think the idea of having config per test target doesn't match the expectation of coverage-py, as such I think there should be an option to override the rc file using the standard coverage-py COVERAGE_RCFILE environment variable e.g

    py_test(
        name = name,
         ...
        env =  {
            "COVERAGE_RCFILE": "$(location //.coveragerc)",
        },
  )      

Another option is to specify rc file as part of the toolchain

python.toolchain( 
    configure_coverage_tool = True,
    coverage_rc = "//.coveragerc", 
    python_version = "3.11.6",
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants