Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] move all examples to dontrun() to fix R CMD CHECK notes #3270

Merged
merged 9 commits into from
Aug 6, 2020

Conversation

jameslamb
Copy link
Collaborator

In the R Hub checks I ran in #629 (comment), builds on several linux systems had this R CMD CHECK note:

* checking examples ... NOTE
Examples with CPU (user + system) or elapsed time > 5s
                      user system elapsed
lgb.cv              23.612  0.098  66.864
lgb.model.dt.tree   12.887  0.039  35.043
lgb.importance       7.528  0.051  19.835
lgb.plot.importance  7.190  0.028  20.008
lgb.interprete       5.828  0.031  16.085
lgb.get.eval.result  4.639  0.012  13.082
predict.lgb.Booster  4.547  0.038  12.532
lgb.unloader         4.558  0.017  12.948
lgb.train            4.512  0.049  12.857

My first idea to fix these issues was to move all examples to vignettes (#629 (comment)), but tonight I had another idea that I think is a lot easier.

  1. Wrap all examples in \dontrun{} so by default they aren't run
  2. In CI, use R CMD CHECK --run-dontrun so that in CI, we still test that all the examples work. We can ignore the example-timing NOTE if it comes up, because the \dontrun{} means it will never come up on CRAN

This PR also replaces uses of \donttest{} with \dontrun{}. As of R 4.02, \donttest{} examples will be run by R CMD CHECK (release notes)

R CMD check --as-cran now runs \donttest examples (which are run by example())
instead of instructing the tester to do so. This can be temporarily circumvented during
development by setting environment variable R_CHECK_DONTTEST_EXAMPLES to a
false value.

@StrikerRUS I made this a LightGBM branch so we could test readthedocs. Could you enable readthedocs build temporarily for this branch?

@jameslamb
Copy link
Collaborator Author

jameslamb commented Aug 4, 2020

I've re-submitted the R Hub checks for all checks that had the note about example timings. These can sometimes take up to a day to run, so I'll update this PR with the results whenever they run.

result <- rhub::check(
    path = "lightgbm_2.3.2.tar.gz"
    , email = "jaylamb20@gmail.com"
    , check_args = "--as-cran"
    , platform = c(
        "ubuntu-gcc-release",
        "ubuntu-gcc-devel",
        "debian-gcc-devel-nold",
        "debian-clang-devel",
        "linux-x86_64-centos6-epel-rdt"
    )
    , env_vars = c(
        "R_COMPILE_AND_INSTALL_PACKAGES" = "always"
    )
)

@StrikerRUS
Copy link
Collaborator

Could you enable readthedocs build temporarily for this branch?

Sure! Just enabled this branch at RTD. Let's wait about 8mins for results.

@StrikerRUS
Copy link
Collaborator

RTD build was successful.

image

I'm wonder how critical is that WARNING for CRAN?

Marking all examples as dontrun we are hiding all outputs from RTD readers. I don't think it is good. Or maybe we can add --run-dontrun at RTD as well?

cf.

master

image

this PR

image

@StrikerRUS
Copy link
Collaborator

BTW, seems that something is wrong with the plot at master.

@jameslamb
Copy link
Collaborator Author

I'm wonder how critical is that WARNING for CRAN?

I don't know if this is one of the things CRAN is open to negotiation on or not.

My thinking is that getting to CRAN is the highest priority and that losing these plots in examples is a very small price to pay for it because (I believe) the main way people use these examples is to copy the code and run it themselves anyway.

There are no running-time restrictions on vignettes, as far as I know. so when we pick up #1944 we can have any beautiful logs and plots we want.

So my longer-term goal is to still do this suggestion (#629 (comment)), where we completely remove these examples and all the function-level documentation points users to the appropriate vignette(s). But that is the type of thing we can do totally independent of CRAN.

@StrikerRUS
Copy link
Collaborator

I totally agree with you in that CRAN is much more important than examples. But what do you think about

Or maybe we can add --run-dontrun at RTD as well?

?

I just checked pkgdown site and seems we can pass run_dont_run = TRUE: https://pkgdown.r-lib.org/reference/build_site.html#arguments. As we run examples at GitHub Actions, we are sure they are working.

@jameslamb
Copy link
Collaborator Author

oooooo I forgot about run_dontrun in pkgdown::build_site()! Yes I like that very much. I'll add that change and then also look into the weird plot

@StrikerRUS
Copy link
Collaborator

That's awesome! Thank you!

Please don't forget that previous dontrun directives were added because unloader causes RTD builds to fail.

@jameslamb
Copy link
Collaborator Author

That's awesome! Thank you!

Please don't forget that previous dontrun directives were added because unloader causes RTD builds to fail.

😬 😬 I did forget! I thought we added them in an attempt to fix that "examples take too long to run" issue.

If they cause it to fail, I'll just literally put an if (FALSE) in that example myself, with a comment on why it's there. lgb.unlooader() is for a very special and somewhat rare case, so it is't worth sacrificing too much for it.

@StrikerRUS
Copy link
Collaborator

OK, sure!

To be honest I don't know whether unloader is still causing crashes. It was back to May 2019: #2176 (comment) we added initial dontrun.

@jameslamb
Copy link
Collaborator Author

The R Hub checks finally returned, but I missed one example so they all have that NOTE about timing 🙃

fixed (along with this change: #3270 (comment)) in 249f010

I'll check the RTD build in a few minutes to see if it is ok, and look into the issue with the plot

@jameslamb
Copy link
Collaborator Author

looks like the build succeeded! https://readthedocs.org/projects/lightgbm/builds/11597488/

And example outputs are back! (https://lightgbm.readthedocs.io/en/docs-fix-sample-runtime/R/reference/lgb.train.html):

image

It's a little annoying that the # \dontrun comment is shown but honestly it doesn't bother me. Anyone using the docs will ignore it, and with this PR we're gaining more than we're losing.

The plot example on lgb.plot.interpretation is still broken, which is confusing because it looks ok on my laptop

image

I'm looking into that right now.

@jameslamb
Copy link
Collaborator Author

I just built the pkgdown site locally and this plot looks fine 😬

image

I used the same command as we use in docs/conf.py

pkgdown::build_site(
    lazy = FALSE
    , install = FALSE
    , devel = FALSE
    , examples = TRUE
    , run_dont_run = TRUE
    , seed = 42L
    , preview = FALSE
    , new_process = TRUE
)

@jameslamb jameslamb mentioned this pull request Aug 5, 2020
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Aug 5, 2020

I just built the pkgdown site locally and this plot looks fine 😬

I believe the version of R can cause the difference in outputs. However, this is not releated to this PR, so it should not block us from its merging.

r-base=3.5.1=h1e0a451_2 \

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that unloader doesn't cause RTD fails anymore. This is really cool!

Thank you for one more -1 CRAN NOTE!

@jameslamb
Copy link
Collaborator Author

I just built the pkgdown site locally and this plot looks fine grimacing

I believe the version of R can cause the difference in outputs. However, this is not releated to this PR, so it should not block us from its merging.

r-base=3.5.1=h1e0a451_2 \

I agree, I'll open a new issue. I was trying to upgrade the conda environment to 3.6.1 (the latest r-base on anaconda main channels, unfortunately), and struggling with it.

@jameslamb
Copy link
Collaborator Author

R Hub results came back!

The example timing note has been eliminated from all of them!

Debian Linux, R-devel, clang, ISO-8859-15 locale

1 WARNING, 2 NOTEs

logs

* checking installed package size ... NOTE
  installed size is 11.0Mb
  sub-directories of 1Mb or more:
    libs  10.4Mb
* checking top-level files ... WARNING
A complete check needs the 'checkbashisms' script.
See section 'Configure and cleanup' in the 'Writing R Extensions'
manual.

Debian Linux, R-devel, GCC, no long double

1 WARNING, 2 NOTEs

logs

* checking installed package size ... NOTE
  installed size is 11.0Mb
  sub-directories of 1Mb or more:
    libs  10.5Mb
* checking top-level files ... WARNING
A complete check needs the 'checkbashisms' script.
See section 'Configure and cleanup' in the 'Writing R Extensions'
manual.

Ubuntu Linux 16.04 LTS, R-devel, GCC

1 WARNING, 2 NOTEs

logs

* checking installed package size ... NOTE
  installed size is 11.6Mb
  sub-directories of 1Mb or more:
    libs  11.0Mb
* checking top-level files ... WARNING
A complete check needs the 'checkbashisms' script.
See section 'Configure and cleanup' in the 'Writing R Extensions'
manual.

CentOS 6 with Redhat Developer Toolset, R from EPEL

1 ERROR, 1 WARNING, 2 NOTEs

logs

* checking installed package size ... NOTE
  installed size is 36.2Mb
  sub-directories of 1Mb or more:
    libs  35.6Mb
* checking compilation flags used ... WARNING
Compilation used the following non-portable flag(s):
  ‘-Wp,-D_FORTIFY_SOURCE=2’
* checking tests ...
  Running ‘testthat.R’ [7m/15m]
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  You can set `force_row_wise=true` to remove the overhead.
  And if memory is not enough, you can set `force_col_wise=true`.
  [LightGBM] [Info] Total Bins 232
  [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
  [LightGBM] [Info] Start training from score 0.482113
  [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
  ══ testthat results  ═══════════════════════════════════════════════════════════
  [ OK: 565 | SKIPPED: 2 | WARNINGS: 0 | FAILED: 4 ]
  1. Failure: lgb.train() works as expected with sparse features (@test_basic.R#477) 
  2. Failure: learning-to-rank with lgb.cv() works as expected (@test_learning_to_rank.R#119) 
  3. Failure: learning-to-rank with lgb.cv() works as expected (@test_learning_to_rank.R#125) 
  4. Failure: learning-to-rank with lgb.cv() works as expected (@test_learning_to_rank.R#131) 
  
  Error: testthat unit tests failed
  Execution halted

Ubuntu Linux 16.04 LTS, R-release, GCC

2 NOTEs

logs

* checking installed package size ... NOTE
  installed size is 35.7Mb
  sub-directories of 1Mb or more:
    libs  35.2Mb

@jameslamb jameslamb merged commit c454d5f into master Aug 6, 2020
@StrikerRUS StrikerRUS deleted the docs/fix-sample-runtime branch August 6, 2020 02:02
@jameslamb
Copy link
Collaborator Author

I added #3276 to track fixing those plots

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants