Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal Varianсe Sampling booster #4266

Merged
merged 32 commits into from
Mar 12, 2022
Merged

Minimal Varianсe Sampling booster #4266

merged 32 commits into from
Mar 12, 2022

Conversation

kruda
Copy link
Contributor

@kruda kruda commented May 9, 2021

Implementation of minimal variance sampling for stochastic gradient boosting. Contributes to #2644.
Note for reviers

I've tested it on examples for binary and multiclass classification from lightgbm repository, tried to train on Higgs dataset using slightly modified config from https://github.com/guolinke/boosting_tree_benchmarks.git. Trained several times on https://www.kaggle.com/c/DontGetKicked .
On tested datasets this MVS implementation shows better score and overfitts, than simple bagging strategy with same sampling rate.
Update:
Mean relative error change table:

Algorithm\sampling fraction 0.05 0.1625 0.275 0.3875 0.5 0.6125 0.725 0.8375 0.95
mvs 8.314 3.032 0.775 0.061 -0.391 -0.403 -0.26 -0.185 0.084
sgb 13.037 4.516 3.169 2.569 2.105 1.521 1.08 0.343 -0.155
goss 17.263 4.672 3.285 2.223 1.196 0.488 0.007 -0.05 -0.079
mvs_adaptive 8.314 2.919 0.904 -0.195 -0.451 -0.466 -0.274 -0.079 0.206

Example on adult dataset:
изображение

Code to reproduce:
https://github.com/kruda/lightgbm_mvs_experiments

@ghost
Copy link

ghost commented May 9, 2021

CLA assistant check
All CLA requirements met.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the contribution! I'll do a more complete review in the next day or two.

For now, I left two very minor suggestions for the R side.

Could you also please add some tests on this feature? As a start, you could try updating the Dask tests to cover this code:

boosting_types = ['gbdt', 'dart', 'goss', 'rf']

If you get stuck, please let me know and I'd be happy to help.

R-package/src/Makevars.in Outdated Show resolved Hide resolved
R-package/src/Makevars.win.in Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator

@kruda are you still interested in pursuing this pull request?

@jameslamb
Copy link
Collaborator

@shiyu1994 what does it mean that this PR is listed in #4677 with your name next to it?

Does it mean you're planning a separate implementation of MVS from your own branch? And if that's it, should this PR be closed? It has been almost 4 months since it received a new commit.

@shiyu1994
Copy link
Collaborator

Hi @jameslamb, sorry if that confuses you. No. We are just trying to merge this PR. If @kruda is not responding, can we merge this first (there are only minor issues to be fixed). And we can open some subsequent PRs to complete it. What do you think about this?

I'll remove my name from the list to avoid ambiguity. Thanks you.

@StrikerRUS
Copy link
Collaborator

I suppose it'll be better to continue the work (apply fixes) directly in this PR. If @kruda didn't unchecked enabled by default possibility to push into this branch by maintainers, then it'll possible.
I'd better prefer not having the repository in a undesirable state if it can be avoided (my position hasn't changed since #2815 (comment)).
https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

@shiyu1994
Copy link
Collaborator

shiyu1994 commented Oct 30, 2021

Oh sure. It would be the best if we can directly push to this branch. Thanks for your suggestion.

BTW, @StrikerRUS @jameslamb, I think both of you are more experienced then me in managing open-source projects. So if any of my behaviors makes you confused or seems no good to the project, please feel free to correct me. Thanks!

@StrikerRUS
Copy link
Collaborator

@shiyu1994 Thank you for your kindness and openness! I believe everything is fine, please don't worry.

@jameslamb
Copy link
Collaborator

I suppose it'll be better to continue the work (apply fixes) directly in this PR

I agree with @StrikerRUS . If someone opens a pull request into this project from their fork (even master), unless they've explicitly opted out, I think it's ok for maintainers to push directly to that branch to ensure the PR gets finished in an acceptable amount of time.

It's ok for you to push changes directly here @shiyu1994 . Then @StrikerRUS , I, and others can review them when you're ready.

@guolinke
Copy link
Collaborator

guolinke commented Mar 9, 2022

I think it is better to merge this PR into a branch of the LightGBM repo first, then we continue to develop on that branch, and merge it to the master branch when it is ready.

@StrikerRUS StrikerRUS changed the base branch from master to mvs_dev March 10, 2022 00:40
@StrikerRUS
Copy link
Collaborator

@guolinke

I think it is better to merge this PR into a branch of the LightGBM repo first, then we continue to develop on that branch, and merge it to the master branch when it is ready.

Just created a new mvs_dev branch and re-targeted this PR into it.

@guolinke
Copy link
Collaborator

the merge is blocked by conflicts 😅 , it seems most of them are document related. @shiyu1994 @jameslamb can you help to reslove them?

…t#5068)

* Update test_windows.ps1

* Update .appveyor.yml

* Update test_windows.ps1

* Update .appveyor.yml
@shiyu1994
Copy link
Collaborator

I'm picking this up. This PR is almost ready.

@guolinke
Copy link
Collaborator

@shiyu1994 then we still merge it to mvs_dev branch first, then you can develop based on it?

@jameslamb
Copy link
Collaborator

then we still merge it to mvs_dev branch first, then you can develop based on it?

We should definitely do that. @kruda could delete their fork or this branch at any time. Let's move merge this code to mvs_dev as soon as possible so that doesn't happen.

@shiyu1994 shiyu1994 merged commit 86822d6 into microsoft:mvs_dev Mar 12, 2022
@shiyu1994
Copy link
Collaborator

Let's move merge this code to mvs_dev as soon as possible

Done with that. Will work on microsoft:mvs_dev instead.

@StrikerRUS
Copy link
Collaborator

@kruda Thank you so much for your work!

Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants