Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Inconsistent behavior when a single-leaf tree is encountered #5051

Open
shiyu1994 opened this issue Mar 3, 2022 · 4 comments
Open
Labels

Comments

@shiyu1994
Copy link
Collaborator

shiyu1994 commented Mar 3, 2022

Description

With a single-leaf tree, CLI version stops training at once. But Python API continues to train.

Reproducible example

Example by @arnocandel in #4708.

Additional Comments

We have two choices:

  1. Stopped training at once when a single leaf tree is encountered in all APIs.
  2. Continue to train, add the same prediction value to all training samples. The prediction value should be
    - sum_of_gradients/sum_of_hessians in the root node, which is now not calculated.

Gently ping @guolinke @jameslamb @StrikerRUS @hzy46 @btrotta for your opinion.

@guolinke
Copy link
Collaborator

guolinke commented Mar 7, 2022

I think option 2 is better.

@shiyu1994
Copy link
Collaborator Author

I think option 2 is better.

Strongly agree.

@jameslamb
Copy link
Collaborator

jameslamb commented Mar 9, 2022

Thanks for writing this up and for the description in #4708 (comment)!

I think option 2 is preferable.

Encountering a single-leaf tree doesn't necessarily mean that training should stop. I'd expect future boosting rounds could still find informative splits in some situations, like:

  • using bagging_fraction to re-sample rows
  • using feature_fraction to randomly choose features
  • using a custom objective that has some sort of randomness in it or some behavior dependent on the number of iterations

@shiyu1994
Copy link
Collaborator Author

shiyu1994 commented Mar 24, 2022

@jameslamb Thanks. Then let's go to option 2.

Samsagax added a commit to Samsagax/LightGBM that referenced this issue Feb 4, 2023
As per discussion on GH-microsoft#5051 and GH-microsoft#5193 the python package does not stop
training if a single leaf tree (stupm) is found and relies on early
stopping methods to stop training. This commits removes the finish
condition on training based on the result of `TrainOneIter()` and sets
the `is_finished` flag on early stopping alone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants