Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] enable use of trees with linear models at leaves (fixes #3319) #3699

Merged
merged 17 commits into from
Jan 18, 2021

Conversation

jameslamb
Copy link
Collaborator

This PR closes #3319, adding support for the new "leaves with linear models" feature @btrotta added in #3299.

Thankfully, the concerns from #3319 (comment) about the order in constructing the dataset weren't an issue! lgb.train() in the R package calls update_params() on the Dataset object and constructs it (

data$update_params(params = params)
data$construct()
), so it will respect linear_trees = true. If you've already called $construct() on the dataset before passing it in to lgb.train(), you'll get an informative error telling you that you can't use an already-constructed Dataset with linear_trees = false in training with linear_trees = true.

The unit tests I added for lgb.train() were based on those added in https://github.com/microsoft/LightGBM/pull/3299/files#diff-263bfdd4e44ba860ef61b458adfab0d5f391766a27a3fa7e9f221afc5f1e5bad, except refit() because the R package does not currently support refit() (#2369).

@@ -1699,6 +1699,12 @@ CXX=`"${R_HOME}/bin/R" CMD config CXX11`
# LightGBM-specific flags
LGB_CPPFLAGS=""

#########
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run r-valgrind

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this build (https://github.com/microsoft/LightGBM/runs/1629232486?check_suite_focus=true) failed with an error I didn't see testing the CRAN package locally

In file included from ./include/Eigen/Dense:1,
from treelearner/linear_tree_learner.cpp:7:
./include/Eigen/Core:15:10: fatal error: src/Core/util/DisableStupidWarnings.h: No such file or directory
15 | #include "src/Core/util/DisableStupidWarnings.h"

Will look into it tomorrow. Probably just a mistake I made with the include paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run r-valgrind

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run r-valgrind

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the problem is that the fatal error causes lightgbm to exit without freeing some allocated memory? Similar to this: https://stackoverflow.com/questions/1901322/valgrind-reports-memory-leak-when-assigning-a-value-to-a-string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! I know we ran into similar issues in the past. I'm going to try pushing a commit right now that skips those tests, to test that theory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run r-valgrind

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @btrotta ! After skipping the one test introduced in this PR that checks for errors, valgrind is happy 😀

==2177== 
==2177== HEAP SUMMARY:
==2177==     in use at exit: 312,459,948 bytes in 55,720 blocks
==2177==   total heap usage: 2,837,681 allocs, 2,781,961 frees, 6,180,339,569 bytes allocated
==2177== 
==2177== 336 bytes in 1 blocks are possibly lost in loss record 152 of 2,721
==2177==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2177==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==2177==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==2177==    by 0x5706322: allocate_stack (allocatestack.c:622)
==2177==    by 0x5706322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==2177==    by 0x56D4DDA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2177==    by 0x56CC8E0: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2177==    by 0x15576383: LGBM_DatasetCreateFromCSC (c_api.cpp:1305)
==2177==    by 0x155A5512: LGBM_DatasetCreateFromCSC_R (lightgbm_R.cpp:91)
==2177==    by 0x4942146: R_doDotCall (dotcode.c:634)
==2177==    by 0x494CFDD: do_dotcall (dotcode.c:1281)
==2177==    by 0x49A06DF: bcEval (eval.c:7072)
==2177==    by 0x498C2C0: Rf_eval (eval.c:727)
==2177==    by 0x498F02F: R_execClosure (eval.c:1897)
==2177== 
==2177== LEAK SUMMARY:
==2177==    definitely lost: 0 bytes in 0 blocks
==2177==    indirectly lost: 0 bytes in 0 blocks
==2177==      possibly lost: 336 bytes in 1 blocks
==2177==    still reachable: 312,459,612 bytes in 55,719 blocks
==2177==                       of which reachable via heuristic:
==2177==                         newarray           : 4,264 bytes in 1 blocks
==2177==         suppressed: 0 bytes in 0 blocks
==2177== Reachable blocks (those to which a pointer was found) are not shown.
==2177== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2177== 
==2177== For lists of detected and suppressed errors, rerun with: -s
==2177== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)
writing valgrind output to valgrind-logs.log
valgrind found 0 bytes definitely lost
valgrind found 0 bytes indirectly lost
valgrind found 336 bytes possibly lost

I think we should leave this skip in to get this feature merged, and I can work on removing it separately as a follow-up task.

@jameslamb
Copy link
Collaborator Author

I'm going to mark this work in progress, sorry. Some R CMD check things I need to fix. Will add reviewers back when it's ready.

@jameslamb jameslamb changed the title [R-package] enable use of trees with linear models at leaves (fixes #3319) WIP: [R-package] enable use of trees with linear models at leaves (fixes #3319) Dec 31, 2020
@jameslamb jameslamb marked this pull request as draft January 1, 2021 05:58
@jameslamb jameslamb marked this pull request as ready for review January 1, 2021 22:47
@jameslamb jameslamb changed the title WIP: [R-package] enable use of trees with linear models at leaves (fixes #3319) [R-package] enable use of trees with linear models at leaves (fixes #3319) Jan 1, 2021
@@ -155,7 +155,6 @@ struct Config {
// descl2 = missing values must be encoded as ``np.nan`` (Python) or ``NA`` (CLI), not ``0``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb Please add info about missing values representation in R-package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay! Added in 302a71c

@jameslamb
Copy link
Collaborator Author

jameslamb commented Jan 15, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/488906902

Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

I've updated this branch with the latest master, and the branch is in a state where it shouldn't cause valgrind issues on CRAN (#3699 (comment)). I think this is ready for another review.

I'll also test on Solaris soon and report the results.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for updating the docs! But I'm not R-reviewer, so I'd better leave my approval in a form of comment.

Just one question: maybe add a test for save-load linear booster in R?

@jameslamb
Copy link
Collaborator Author

LGTM! Thanks for updating the docs! But I'm not R-reviewer, so I'd better leave my approval in a form of comment.

Makes sense. @guolinke can you take a look when you have time?

Just one question: maybe add a test for save-load linear booster in R?

That's a good suggestion, thanks. I'll do that right now.

@jameslamb
Copy link
Collaborator Author

I just added two tests that check that the model can be saved and reloaded (one tests a text file, one .rds): 61d7b11

I also tested on Solaris with R Hub, and did not see any issues 😁

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StrikerRUS StrikerRUS merged commit ed651e8 into microsoft:master Jan 18, 2021
@jameslamb jameslamb deleted the feat/r-linear-learner branch January 18, 2021 15:46
@StrikerRUS
Copy link
Collaborator

@jameslamb Can this branch be removed?

image

@jameslamb
Copy link
Collaborator Author

oh I thought I already had deleted it. Because i see "jameslamb deleted ..." on this PR

image

maybe I accidentally created a LightGBM branch with that same name. Yep, I'll delete it.

@StrikerRUS
Copy link
Collaborator

Yeah, this PR was from your fork according to jameslamb:feat/....

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] Support trees with linear models at leaves
4 participants