Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cost Effective Gradient Boosting #2014

Merged
merged 24 commits into from
Apr 4, 2019
Merged

Add Cost Effective Gradient Boosting #2014

merged 24 commits into from
Apr 4, 2019

Conversation

remcob-gr
Copy link
Contributor

Fixes #1119 .
The implementation is in the form of a tweak to the serial tree learner and so should work with every derived tree learner, though I've only tested in serial.

Remco Bras added 16 commits February 14, 2019 13:48
Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.
This is heavily based on the serial version, but just adds using the coupled penalties.
…rhead of CEGB, and add sanity checks for the lengths of the penalty vectors.
The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.
@StrikerRUS StrikerRUS requested review from guolinke and chivee April 1, 2019 12:06
@guolinke
Copy link
Collaborator

guolinke commented Apr 2, 2019

is this ready to merge ?

@remcob-gr
Copy link
Contributor Author

@StrikerRUS: It's ready from my perspective.
@guolinke : Could you review?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor notes.

src/treelearner/serial_tree_learner.cpp Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
include/LightGBM/config.h Outdated Show resolved Hide resolved
@StrikerRUS
Copy link
Collaborator

I think we should cite CEGB somehow...
Maybe in params description, like `cost-effective gradient-boosting` <https://papers.nips.cc/paper/6753-cost-efficient-gradient-boosting.pdf>__ penalty for ... ?

@@ -496,6 +530,14 @@ void SerialTreeLearner::FindBestSplitsFromHistograms(const std::vector<int8_t>&
smaller_leaf_splits_->max_constraint(),
&smaller_split);
smaller_split.feature = real_fidx;
smaller_split.gain -= config_->cegb_tradeoff * config_->cegb_penalty_split * smaller_leaf_splits_->num_data_in_leaf();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_->cegb_tradeoff * config_->cegb_penalty_split is zero by default, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. By default, this line doesn't change the gain, and so doesn't change the behaviour.
It only makes a difference if the user specifies a cegb_penalty_split.

@guolinke
Copy link
Collaborator

guolinke commented Apr 3, 2019

I think it will be better to add a section into advaced-topics: https://github.com/Microsoft/LightGBM/blob/master/docs/Advanced-Topics.rst, about how to use cfgb.

@remcob-gr
Copy link
Contributor Author

I've added a section to the docs on using CEGB, including a link to the paper.
@StrikerRUS , @guolinke : Could you take a look?

@guolinke
Copy link
Collaborator

guolinke commented Apr 3, 2019

Thanks @remcob-gr ,it looks good to me.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot @remcob-gr !

Also, it'll be great if you can add some tests.

@remcob-gr
Copy link
Contributor Author

@StrikerRUS : I've added some tests. Could you take a look and see if there are other tests you'd like?

@StrikerRUS
Copy link
Collaborator

@remcob-gr Perfect! Many thanks!

@guolinke Can we merge?

@guolinke
Copy link
Collaborator

guolinke commented Apr 4, 2019

@StrikerRUS sure

@guolinke guolinke merged commit 7610228 into microsoft:master Apr 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Cost Effective Gradient Boosting to LightGBM
3 participants