-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] suppress the warning about categorical feature override #3379
Comments
I'm willing to take this as my first contribution to the repo. |
From reading other issues, it seems the "correct" way of defining categorical columns is via the Dataset. This might only be a problem when the input is a Pandas dataframe. The problem is that when calling If we had an option of None then that might be a way of getting rid of conflicting settings that require a warning. |
Seems to be not a friendly way to suppress this warning:
results in these warnings:
|
When I do a grid search using the sklearn api and pass eval_set to fit, I get this warning for every element in the grid (many times!). I'm just passing a dataframe with categorical features as the train X, the same for eval set, never explicitly passing categorical_feature. I don't think this behavior is desirable. |
Indeed it's not necessary to use the sklearn API in order to reproduce the above. I've provided simple instructions in #3640. |
I get this warning when using scikit-learn wrapper of LightGBM. Dataset passed to LightGBM is through a scikit-learn pipeline which preprocesses the data in a pandas dataframe and produces a numpy array. Note that this input dataset which the model receives is NOT a Pandas dataframe but numpy array. I set the feature_name and categorical_feature parameters in fit() method as this is the only place these can be set, if you're not using LightGBM native Dataset creation. I think the warning is useful in some situations but superfluous in the case mentioned above. C:..\anaconda3\lib\site-packages\lightgbm\basic.py:1286: UserWarning: Overriding the parameters from Reference Dataset. |
Hi all, Code :
|
With multiple processes in a grid search it's not event possible to use a context manager to suppress this warning during fit, it seems that the context state is lost somehow, my notebook gets literally flooded of:
|
As a workaround I did the following: class SilentRegressor(lgb.LGBMRegressor):
def fit(self, *args, **kwargs):
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=UserWarning)
return super().fit(*args, verbose=False, **kwargs) |
I did some investigation on this issue. There're mainly two kinds of source for these warnings regarding categorical features. If the dataset doens't have a reference, the warnings only come from here:LightGBM/python-package/lightgbm/basic.py Lines 2045 to 2074 in 7fa07ee
This function will be called before the Dataset.construct() is called. One can use the following code to reproduce: import random
import numpy as np
import pandas as pd
import lightgbm as lgb
Categorical_Feature_When_Construct_Dataset = ["a", "b", "d"]
Categorical_Feature_When_Train = 'auto'
def get_data(N):
data = []
labels = []
for i in range(N):
sample = {
"a": random.choice([100, 200, 300, 400]),
"b": random.choice([222, 333]),
"c": random.random(),
}
if sample["a"] == 200 or sample["a"] == 300:
if sample["b"] == 333:
label = 1
else:
label = 0
else:
label = 0
labels.append(label)
data.append(sample)
features = pd.DataFrame(data)
features["d"] = pd.Categorical(
[random.choice(["x", "y", "z"]) for i in range(N)], categories=["x", "y", "z"], ordered=False
)
labels = pd.Series(labels)
return features, labels
N = 1000
train_features, train_labels = get_data(N)
test_features, test_labels = get_data(N)
lgb_train = lgb.Dataset(train_features, train_labels, categorical_feature=Categorical_Feature_When_Construct_Dataset)
params = {
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': {'l2', 'l1'},
'num_leaves': 4,
'learning_rate': 0.5,
'verbose': 0,
}
gbm = lgb.train(params,
lgb_train,
num_boost_round=1,
categorical_feature=Categorical_Feature_When_Train,
)
Here, if If we use
For this first source, my proposal is: If the user is using specific columns to override "auto", we don't report the warning. Because the user is just overriding the default parameter.It aligns with the current behavior. What we need to do is to remove the warning information for case 2 and case 3 in the table. The second source comes from the dataset with a reference:LightGBM/python-package/lightgbm/basic.py Lines 1778 to 1781 in 7fa07ee
This is always reported if the referenced dataset has any categorical features. For the referenced dataset, its LightGBM/python-package/lightgbm/basic.py Lines 1498 to 1518 in 7fa07ee
For this one, my suggestion is to ignore categorical features when comparing the params. |
Thanks for your detailed analysis. I think the proposed solution is feasible! |
following code is fine, if we do not want to create a new class.
|
Is this problem solved?? |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
categorical_column
could be set in bothlgb.train
andlgb.Dataset
. But this warning seems always show up if settingcategorical_column
. I think this is quite annoying.The text was updated successfully, but these errors were encountered: