You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to understand how the support for missing values is implemented and I came across this weird edge case, I'm not sure if it's a bug or not:
importnumpyasnpfromlightgbmimportLGBMRegressor# 1 feature, 5 samplesX=np.array([0, 1, np.inf, np.nan, np.nan]).reshape(-1, 1)
# Easy split: all nans go to right child, the rest go to left childy=np.array([0, 0, 0, 1, 1])
gbdt=LGBMRegressor(n_estimators=1, min_child_samples=1)
print(gbdt.fit(X, y).predict(X))
I'm getting [0.36 0.36 0.46 0.46 0.46], (+inf sample goes to right node)
while I would have expected [0.36 0.36 0.36 0.46 0.46] (+inf sample goes to left node)
As far as I understand the code: the threshold for the last non-missing bin is hardcoded to be +inf, and this is the bin where the split happens. Since +inf thresholds are capped to 1e300, the +inf sample is mapped to the right child because 1e300 < +inf
Changing the +inf sample to another value (e.g. 5): the threhsold for the last non-missing bin is still 1e300, but this time 5 <= 1e300 so it will be mapped to the left node, as expected
The text was updated successfully, but these errors were encountered:
I'm trying to understand how the support for missing values is implemented and I came across this weird edge case, I'm not sure if it's a bug or not:
I'm getting
[0.36 0.36 0.46 0.46 0.46]
, (+inf sample goes to right node)while I would have expected
[0.36 0.36 0.36 0.46 0.46]
(+inf sample goes to left node)As far as I understand the code: the threshold for the last non-missing bin is hardcoded to be
+inf
, and this is the bin where the split happens. Since+inf
thresholds are capped to1e300
, the+inf
sample is mapped to the right child because1e300 < +inf
Changing the
+inf
sample to another value (e.g. 5): the threhsold for the last non-missing bin is still1e300
, but this time5 <= 1e300
so it will be mapped to the left node, as expectedThe text was updated successfully, but these errors were encountered: