Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for infinite values with Nans #2277

Closed
NicolasHug opened this issue Jul 22, 2019 · 1 comment
Closed

Support for infinite values with Nans #2277

NicolasHug opened this issue Jul 22, 2019 · 1 comment

Comments

@NicolasHug
Copy link

NicolasHug commented Jul 22, 2019

I'm trying to understand how the support for missing values is implemented and I came across this weird edge case, I'm not sure if it's a bug or not:

import numpy as np
from lightgbm import LGBMRegressor

# 1 feature, 5 samples
X = np.array([0, 1, np.inf, np.nan, np.nan]).reshape(-1, 1)
# Easy split: all nans go to right child, the rest go to left child
y = np.array([0, 0, 0, 1, 1])

gbdt = LGBMRegressor(n_estimators=1, min_child_samples=1)
print(gbdt.fit(X, y).predict(X))

I'm getting
[0.36 0.36 0.46 0.46 0.46], (+inf sample goes to right node)
while I would have expected
[0.36 0.36 0.36 0.46 0.46] (+inf sample goes to left node)


As far as I understand the code: the threshold for the last non-missing bin is hardcoded to be +inf, and this is the bin where the split happens. Since +inf thresholds are capped to 1e300, the +inf sample is mapped to the right child because 1e300 < +inf

Changing the +inf sample to another value (e.g. 5): the threhsold for the last non-missing bin is still 1e300, but this time 5 <= 1e300 so it will be mapped to the left node, as expected

@guolinke
Copy link
Collaborator

guolinke commented Aug 1, 2019

I think the root cause is this PR:
#617

I think a better solution is to let json support these inf/nan, not to change our model format.
Also refer to #2266

@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants