Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LightGBM] [Warning] No further splits with positive gain, best gain: -inf #4649

Closed
Tracked by #5153
truongphanduykhanh opened this issue Oct 4, 2021 · 5 comments
Closed
Tracked by #5153

Comments

@truongphanduykhanh
Copy link

Description

I train a binary classification model on a data set having 10,000 rows and 600 features. The warning [LightGBM] [Warning] No further splits with positive gain, best gain: -inf appears exactly 10 times in a training having 10 num_boost_round. It is less likely that there is no positive gain right at the first rounds. In addition, the evals_result shows that the auc is still improving on both train and validation sets. So I suspect the warning may have some problem.

Code:

params = {'learning_rate': 0.2, 'max_leaves': 1024, 'objective': 'binary', 'metric': 'auc'}

evals_result = {}
booster = lgb.train(
    params=params,
    train_set=train,
    num_boost_round=10,
    valid_sets=[train, val],
    valid_names=['train', 'val'],
    verbose_eval=False,
    evals_result=evals_result
)

Result:

[LightGBM] [Info] Number of positive: 341, number of negative: 11077
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.061360 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 73303
[LightGBM] [Info] Number of data points in the train set: 11418, number of used features: 554
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.029865 -> initscore=-3.480744
[LightGBM] [Info] Start training from score -3.480744
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

evals_result:

{'train': OrderedDict([('auc',
               [0.9205950243788019,
                0.9515667056808684,
                0.9695284170497268,
                0.9782565761344807,
                0.9831135397988541,
                0.9864338063308904,
                0.9887733347241133,
                0.990455507793089,
                0.9916322876627139,
                0.9927643260704792])]),
 'val': OrderedDict([('auc',
               [0.5482488069238858,
                0.5978726846234733,
                0.5862654695462266,
                0.595729191943703,
                0.6313192590795115,
                0.6567985116881015,
                0.653158618458303,
                0.648386314001456,
                0.6711154250586427,
                0.6565558521394483])])}

Installation

pip install lightgbm
LightGBM version: Version: 3.2.1

@jameslamb
Copy link
Collaborator

Thanks very much for using LightGBM!

the evals_result shows that the auc is still improving on both train and validation sets. So I suspect the warning may have some problem.

I think you may have misunderstood what this warning means. It is emitted if the tree-growing process for a specific tree stops before num_leaves leaves have been added.

// cannot split, quit
if (best_leaf_SplitInfo.gain <= 0.0) {
Log::Warning("No further splits with positive gain, best gain: %f", best_leaf_SplitInfo.gain);
break;

So there is no inconsistency between the fact that you see this warning and the fact that it appears that the fit to training data and performance on validation data seems to be improving at every iteration. If you inspect the structure of the model after training using .save_model() or .trees_to_dataframe(), you should see some trees in the model with less than num_leaves leaves.

You could use the example code in #4561 (comment) as a reference for how to use .trees_to_dataframe().


There's probably an opportunity to make that warning clearer. For example, the -Inf is an implementation detail and not something I think users need to know about.

Would you find it clearer if this warning had looked like this instead?

[LightGBM] [Warning] Stopped adding splits before reaching max_depth=7 or num_leaves=31. No further splits with positive gain and sufficient min_data_in_leaf and min_sum_hessian_in_leaf.

@truongphanduykhanh
Copy link
Author

Thanks @jameslamb for your detail explanation. Indeed, I misunderstand the the warning because it appears at every num_boost_round so I thought there is no gain after each num_boost_round , not at split level.

The first sentence of your suggestion is much more clearer.

[LightGBM] [Warning] Stopped adding splits before reaching max_depth=7 or num_leaves=31.

But the second sentence, do you mean something like this?

No further splits generate positive gain, given constraints of min_data_in_leaf and min_sum_hessian_in_leaf.

@jameslamb
Copy link
Collaborator

you mean something like this

yep,exactly!

@truongphanduykhanh
Copy link
Author

That would be awesome. In fact, I've seen many questions and discussions about this warning. Hope the adjusted warning version would make the log clearer for end users. Looking for it in next updating version of LightGBM.

Thank you very much James.

@shiyu1994
Copy link
Collaborator

Add the improvement of this warning message as a new feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants