Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] [docs] instructions for generating code coverage produce incorrect results #4919

Closed
jameslamb opened this issue Dec 29, 2021 · 5 comments · Fixed by #4922
Closed
Assignees

Comments

@jameslamb
Copy link
Collaborator

Description

The R package's README contains a section describing how to generate a code coverage report using {covr}.

https://github.com/microsoft/LightGBM/blob/ce486e5b45a6f5e67743e14765ed139ff8d532e5/R-package/README.md#code-coverage

Those instructions run without error and do produce some code coverage, but the coverage results are incorrect.

For example, look at how many files are reported as having 0 coverage.

R/lgb.convert_with_rules.R: 0.00%
R/lgb.cv.R: 0.00%
R/lgb.drop_serialized.R: 0.00%
R/lgb.interprete.R: 0.00%
R/lgb.make_serializable.R: 0.00%
R/lgb.plot.importance.R: 0.00%
R/lgb.plot.interpretation.R: 0.00%
R/lgb.restore_handle.R: 0.00%
R/readRDS.lgb.Booster.R: 0.00%
R/saveRDS.lgb.Booster.R: 0.00

Seeing 0.00% coverage for lgb.cv(), for example, is very surprising, given that there are around 30 calls to that function in the R package's tests.

It isn't like all coverage results are 0.0, as can sometimes happen with pytest-cov for example. Other files do show some test coverage.

R/lgb.importance.R: 96.00%
R/lightgbm.R: 96.43%
R/aliases.R: 100.00%
R/lgb.model.dt.tree.R: 100.00%
R/metrics.R: 100.00%
full coverage results (click me)
lightgbm Coverage: 52.00%
R/lgb.convert_with_rules.R: 0.00%
R/lgb.cv.R: 0.00%
R/lgb.drop_serialized.R: 0.00%
R/lgb.interprete.R: 0.00%
R/lgb.make_serializable.R: 0.00%
R/lgb.plot.importance.R: 0.00%
R/lgb.plot.interpretation.R: 0.00%
R/lgb.restore_handle.R: 0.00%
R/readRDS.lgb.Booster.R: 0.00%
R/saveRDS.lgb.Booster.R: 0.00%
src/boosting/goss.hpp: 0.00%
src/boosting/rf.hpp: 0.00%
src/include/Eigen/src/Core/Map.h: 0.00%
src/include/Eigen/src/Core/products/GeneralMatrixVector.h: 0.00%
src/include/Eigen/src/Core/products/Parallelizer.h: 0.00%
src/include/Eigen/src/Core/Stride.h: 0.00%
src/include/Eigen/src/Core/Transpose.h: 0.00%
src/include/Eigen/src/plugins/MatrixCwiseBinaryOps.h: 0.00%
src/metric/map_metric.hpp: 0.00%
src/metric/xentropy_metric.hpp: 0.00%
src/network/linkers_socket.cpp: 0.00%
src/network/linkers.h: 0.00%
src/network/socket_wrapper.hpp: 0.00%
src/objective/xentropy_objective.hpp: 0.00%
src/treelearner/cuda_tree_learner.h: 0.00%
src/treelearner/data_parallel_tree_learner.cpp: 0.00%
src/treelearner/feature_parallel_tree_learner.cpp: 0.00%
src/treelearner/gpu_tree_learner.h: 0.00%
src/treelearner/parallel_tree_learner.h: 0.00%
src/treelearner/voting_parallel_tree_learner.cpp: 0.00%
src/network/network.cpp: 3.17%
src/include/LightGBM/network.h: 3.39%
src/io/json11.cpp: 4.33%
src/treelearner/cost_effective_gradient_boosting.hpp: 4.81%
src/network/linker_topo.cpp: 4.96%
R/lgb.unloader.R: 5.26%
src/include/Eigen/src/Core/GeneralProduct.h: 6.25%
src/include/Eigen/src/Core/products/GeneralBlockPanelKernel.h: 10.50%
src/boosting/prediction_early_stop.cpp: 14.29%
src/include/LightGBM/utils/array_args.h: 16.53%
src/treelearner/tree_learner.cpp: 18.75%
src/include/Eigen/src/Core/products/GeneralMatrixMatrix.h: 18.90%
src/treelearner/split_info.hpp: 18.99%
src/objective/regression_objective.hpp: 24.93%
src/metric/multiclass_metric.hpp: 25.00%
src/io/parser.hpp: 30.14%
src/include/LightGBM/fmt/format.h: 36.22%
src/include/LightGBM/utils/text_reader.h: 36.65%
src/io/metadata.cpp: 36.73%
src/boosting/gbdt_prediction.cpp: 37.50%
src/objective/multiclass_objective.hpp: 39.29%
src/include/LightGBM/utils/json11.h: 40.00%
src/io/dataset_loader.cpp: 40.87%
src/include/Eigen/src/Core/util/Memory.h: 40.93%
src/application/predictor.hpp: 40.98%
src/io/parser.cpp: 41.32%
src/metric/regression_metric.hpp: 42.03%
R/lgb.Booster.R: 42.12%
src/metric/binary_metric.hpp: 43.72%
src/c_api.cpp: 44.37%
src/boosting/dart.hpp: 45.54%
src/treelearner/col_sampler.hpp: 45.86%
src/objective/objective_function.cpp: 47.37%
src/include/Eigen/src/Core/arch/SSE/PacketMath.h: 47.83%
R/lgb.Dataset.R: 48.18%
src/include/LightGBM/fmt/format-inl.h: 48.62%
src/io/multi_val_dense_bin.hpp: 51.30%
src/io/tree.cpp: 51.98%
src/include/Eigen/src/Core/Inverse.h: 53.33%
src/include/LightGBM/utils/random.h: 54.35%
src/include/Eigen/src/Core/util/BlasUtil.h: 54.55%
src/boosting/gbdt.h: 55.10%
src/io/config.cpp: 56.68%
src/include/LightGBM/objective_function.h: 57.14%
src/boosting/gbdt_model_text.cpp: 57.42%
src/io/multi_val_sparse_bin.hpp: 57.73%
src/io/sparse_bin.hpp: 58.24%
src/treelearner/feature_histogram.hpp: 58.81%
src/include/Eigen/src/Core/GenericPacketMath.h: 60.00%
src/include/Eigen/src/Core/MathFunctions.h: 60.00%
src/io/train_share_states.cpp: 60.42%
src/boosting/gbdt.cpp: 60.48%
src/include/LightGBM/fmt/core.h: 61.80%
src/treelearner/serial_tree_learner.cpp: 63.27%
src/include/LightGBM/feature_group.h: 63.72%
src/include/Eigen/src/Core/Redux.h: 64.58%
src/include/Eigen/src/Core/MapBase.h: 65.38%
src/include/Eigen/src/Core/ProductEvaluators.h: 65.85%
R/lgb.Predictor.R: 66.39%
src/include/LightGBM/tree.h: 66.42%
src/include/Eigen/src/Core/util/Meta.h: 66.67%
src/include/LightGBM/tree_learner.h: 66.67%
src/objective/rank_objective.hpp: 67.68%
src/include/LightGBM/utils/openmp_wrapper.h: 68.00%
src/objective/binary_objective.hpp: 69.23%
src/io/bin.cpp: 69.40%
src/boosting/boosting.cpp: 69.77%
src/include/Eigen/src/plugins/BlockMethods.h: 70.00%
src/include/LightGBM/dataset.h: 70.30%
src/metric/rank_metric.hpp: 70.65%
src/include/Eigen/src/Core/CwiseNullaryOp.h: 71.43%
src/include/Eigen/src/Core/util/IntegralConstant.h: 75.00%
src/include/LightGBM/boosting.h: 75.00%
src/metric/metric.cpp: 75.00%
src/io/dataset.cpp: 76.47%
src/include/LightGBM/bin.h: 76.62%
src/include/Eigen/src/Core/DenseBase.h: 76.92%
src/include/Eigen/src/Core/CoreEvaluators.h: 77.68%
src/io/dense_bin.hpp: 77.73%
src/include/Eigen/src/Core/functors/BinaryFunctors.h: 77.78%
src/include/Eigen/src/Core/functors/UnaryFunctors.h: 77.78%
src/include/LightGBM/utils/common.h: 78.09%
R/utils.R: 78.79%
src/treelearner/serial_tree_learner.h: 78.95%
src/include/Eigen/src/Core/AssignEvaluator.h: 79.25%
src/lightgbm_R.cpp: 79.32%
src/treelearner/data_partition.hpp: 80.30%
src/include/Eigen/src/Core/products/TriangularSolverMatrix.h: 80.70%
R/lgb.train.R: 80.92%
src/include/Eigen/src/Core/DenseCoeffsBase.h: 83.33%
src/include/Eigen/src/Core/functors/AssignmentFunctors.h: 83.33%
src/boosting/score_updater.hpp: 83.67%
src/include/LightGBM/fast_double_parser.h: 83.83%
src/include/LightGBM/train_share_states.h: 84.27%
src/metric/dcg_calculator.cpp: 86.36%
R/callback.R: 86.39%
src/treelearner/linear_tree_learner.cpp: 86.43%
src/include/LightGBM/config.h: 86.83%
src/include/Eigen/src/LU/FullPivLU.h: 90.10%
src/include/Eigen/src/Core/DenseStorage.h: 91.53%
src/include/LightGBM/utils/yamc/alternate_shared_mutex.hpp: 91.67%
src/io/file_io.cpp: 91.67%
src/include/Eigen/src/Core/CwiseBinaryOp.h: 92.31%
src/include/LightGBM/utils/threading.h: 92.63%
src/include/Eigen/src/Core/SolveTriangular.h: 92.86%
src/include/LightGBM/utils/log.h: 93.33%
src/treelearner/leaf_splits.hpp: 93.65%
src/include/LightGBM/utils/file_io.h: 93.75%
src/include/Eigen/src/Core/Block.h: 93.88%
src/include/Eigen/src/Core/PlainObjectBase.h: 94.34%
R/lgb.importance.R: 96.00%
R/lightgbm.R: 96.43%
src/include/Eigen/src/Core/Visitor.h: 97.06%
src/treelearner/monotone_constraints.hpp: 97.96%
src/io/config_auto.cpp: 98.41%
R/aliases.R: 100.00%
R/lgb.model.dt.tree.R: 100.00%
R/metrics.R: 100.00%
src/include/Eigen/src/Core/Assign.h: 100.00%
src/include/Eigen/src/Core/CwiseUnaryOp.h: 100.00%
src/include/Eigen/src/Core/EigenBase.h: 100.00%
src/include/Eigen/src/Core/functors/NullaryFunctors.h: 100.00%
src/include/Eigen/src/Core/Matrix.h: 100.00%
src/include/Eigen/src/Core/MatrixBase.h: 100.00%
src/include/Eigen/src/Core/NoAlias.h: 100.00%
src/include/Eigen/src/Core/NumTraits.h: 100.00%
src/include/Eigen/src/Core/PartialReduxEvaluator.h: 100.00%
src/include/Eigen/src/Core/PermutationMatrix.h: 100.00%
src/include/Eigen/src/Core/Product.h: 100.00%
src/include/Eigen/src/Core/SelfCwiseBinaryOp.h: 100.00%
src/include/Eigen/src/Core/Solve.h: 100.00%
src/include/Eigen/src/Core/SolverBase.h: 100.00%
src/include/Eigen/src/Core/Swap.h: 100.00%
src/include/Eigen/src/Core/TriangularMatrix.h: 100.00%
src/include/Eigen/src/Core/util/ForwardDeclarations.h: 100.00%
src/include/Eigen/src/Core/util/IndexedViewHelper.h: 100.00%
src/include/Eigen/src/Core/util/Macros.h: 100.00%
src/include/Eigen/src/Core/util/SymbolicIndex.h: 100.00%
src/include/Eigen/src/Core/util/XprHelper.h: 100.00%
src/include/Eigen/src/Core/VectorBlock.h: 100.00%
src/include/Eigen/src/Core/VectorwiseOp.h: 100.00%
src/include/Eigen/src/plugins/CommonCwiseBinaryOps.h: 100.00%
src/include/Eigen/src/plugins/CommonCwiseUnaryOps.h: 100.00%
src/include/Eigen/src/plugins/MatrixCwiseUnaryOps.h: 100.00%
src/include/LightGBM/c_api.h: 100.00%
src/include/LightGBM/dataset_loader.h: 100.00%
src/include/LightGBM/metric.h: 100.00%
src/include/LightGBM/prediction_early_stop.h: 100.00%
src/include/LightGBM/utils/pipeline_reader.h: 100.00%
src/include/LightGBM/utils/yamc/yamc_rwlock_sched.hpp: 100.00%
src/include/LightGBM/utils/yamc/yamc_shared_lock.hpp: 100.00%
src/treelearner/linear_tree_learner.h: 100.00%

Reproducible example

sh build-cran-package.sh

Rscript -e " \
    coverage  <- covr::package_coverage('./lightgbm_r', type = 'tests', quiet = FALSE);
    print(coverage);
    covr::report(coverage, file = file.path(getwd(), 'coverage.html'), browse = TRUE);
    "

Environment info

LightGBM version or commit hash: latest master (ce486e5)

Impact of this issue

Test coverage is a valuable tool in guiding where to invest development time in writing tests, as it answers questions like "which paths through the code are not currently tested?".

As long as the test coverage setup with {covr} reports incorrect results for {lightgbm}, maintenance work devoted to increasing test coverage requires manually identifying uncovered code paths.

Additional Comments

While trying to debug, I used the following code which saves the covr::coverage object for further inspection.

sh build-cran-package.sh

Rscript -e " \
    coverage  <- covr::package_coverage('./lightgbm_r', type = 'tests', quiet = FALSE);
    saveRDS('coverage.rds');
    print(coverage);
    covr::report(coverage, file = file.path(getwd(), 'coverage.html'), browse = TRUE);
    "

I suspected that this issue might be related to the used of the now-deprecated testthat::context(), but I don't think so...I get the same incorrect results on the branch from #4915.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Dec 29, 2021

Other things to try while working on this (that I haven't tried yet and won't have time to try for a while):

  • rolling back to older versions of {covr}
  • adding a library(covr) to the beginning of the instructions
    • I've noticed, for example, that print(some_coverage_object) produces different results based on whether or not you've library'd {covr}. So maybe that package does some stuff at the time it's loaded / attached that needs to be done for the reporting to work correctly.
    • see the output of running covr:::.onLoad
  • trying to add a stop() into one of the lgb.cv() tests, to confirm that {covr} is actually running them
  • trying to create a small test package that reproduces the issue, so it can be either fixed here or reported as a bug at https://github.com/r-lib/covr

@jameslamb
Copy link
Collaborator Author

trying to add a stop() into one of the lgb.cv() tests, to confirm that {covr} is actually running them

I tried this today, and saw that covr::package_coverage() failed. So that rules out a theory like "{covr} is not executing all the tests".

Running specific tests for package ‘lightgbm’
  Running ‘testthat.R’
Error: Failure in `/private/var/folders/xq/wktq4zdx4jd3qdpk34d28m940000gn/T/Rtmp3lvEwr/R_LIBS7a361c4735c/lightgbm/lightgbm-tests/testthat.Rout.fail`
658e-06"
[1] "[50]:  train's binary_logloss:1.20212e-06  test's binary_logloss:1.33316e-06"
....
utils: ......................
weighted_loss: ...

══ Failed ══════════════════════════════════════════════════════════════════════
── 1. Error (test_basic.R:359:3): cv works ─────────────────────────────────────
Error in `eval(code, test_env)`: I am a failed test

══ DONE ════════════════════════════════════════════════════════════════════════
Error: Test failures
Execution halted

@jameslamb
Copy link
Collaborator Author

adding a library(covr) to the beginning of the instructions

This did not help either. Still got 0.0% coverage for lgb.cv.

@jameslamb
Copy link
Collaborator Author

Ok, getting closer! The lgb.unloader() tests are problematic for code coverage.

Running the following prior to the testing instructions in this issue's description results in coverage being estimated correctly.

git rm R-package/tests/testthat/test_lgb.unloader.R

That makes sense to me. That function detaches the {lightgbm} namespace from the R session, and I think that could interfere with the mechanisms that {covr} uses to track calls.

try(detach("package:lightgbm", unload = TRUE), silent = TRUE)

Notes from the {covr} "How it Works" vignette (link)

As modifying external environments and correctly restoring them can be tricky
to get correct, we use the C function
reassign_function,

and a bit more (link)

...calls are made to the lazy loading code which installs a user hook to modify the code when it is loaded. We also register a finalizer which prints the coverage counts when the namespace is unloaded or the R process exits. These output files are then aggregated together to determine the coverage.

I'm going to open a PR proposing just skipping those tests when computing coverage.

@jameslamb jameslamb self-assigned this Dec 31, 2021
jameslamb added a commit that referenced this issue Jan 5, 2022
…4922)

* [R-package] [docs] fix calculation of R test coverage (fixes #4919)

* use quotes

* use in_covr() instead of env variable

* Update R-package/README.md

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* check that covr exists

* add covr to suggests

* use R_COVR environment variable

* Update R-package/tests/testthat/test_lgb.unloader.R

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant