Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Passing categorical feature with data-type "Category" without passing "categorical_feature" #4460

Closed
mohammad-saber opened this issue Jul 10, 2021 · 5 comments
Labels

Comments

@mohammad-saber
Copy link

Thank you for sharing your great work. I have a question about handling categorical features without using ohe-hot enconding.

Assume that in the dataset, I have 2 categorical features.

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': ['a', 'b', 'b', 'c', 'c'],
    'C': ['x', 'x', 'y', 'y', 'z'] })

I convert them into ordinal integer values and then convert data-type into "category".

columns_cat = ['B', 'C']

from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder(dtype='int')
x[columns_cat] = encoder.fit_transform(x[columns_cat])

for column in columns_cat:
    x[column] = x[column].astype('category')

Question:
When data-type is "Category", do I need to pass parameter categorical_feature when fitting model?

CASE 1:
In the following case, does LightGBM handles columns ['B', 'C'] as categorical?

from lightgbm import LGBMRegressor
model = LGBMRegressor()
model.fit(x, y)  

CASE 2:
And what is the difference when I pass categorical_feature as below:

from lightgbm import LGBMRegressor
model = LGBMRegressor()
model.fit(x, y, categorical_feature=columns_cat)  

Thank you for your time.

@StrikerRUS
Copy link
Collaborator

@mohammad-saber Thanks for using LightGBM!

'category' columns in pandas.DataFrame are treated as categorical features by default in LightGBM. So,

When data-type is "Category", do I need to pass parameter categorical_feature when fitting model?

you don't need to pass categorical_feature param in this case.

CASE 1:
In the following case, does LightGBM handles columns ['B', 'C'] as categorical?

Yes, it does.

CASE 2:
And what is the difference when I pass categorical_feature as below:

Should be no difference with CASE 1.

Maybe the following unit test will help to better understand handling categorical features in pandas.DataFrame.

def test_pandas_categorical():

Also, please note that ordered categorical columns (pd.Categorical(..., ordered=True)) aren't treated as categorical features by default.

"E": pd.Categorical(np.random.permutation(['z', 'y'] * 30),
ordered=True)})

with pytest.raises(AssertionError):
np.testing.assert_allclose(pred0, pred5) # ordered cat features aren't treated as cat features by default

@mohammad-saber
Copy link
Author

Thank you.

Is there any attribute that we can get "features which were treated as categorical" after model fitting?

@StrikerRUS
Copy link
Collaborator

You can get this info from the LightGBM logs:

UserWarning: categorical_feature in Dataset is overridden.
New categorical_feature is <your cat features here>

#3379.

Also, categorical features are written differently in a model file.

} else if (strs[0] != "none") { // categorical feature
auto vals = CommonC::StringToArray<int>(feature_infos_[i], ':');
auto max_idx = ArrayArgs<int>::ArgMax(vals);
auto min_idx = ArrayArgs<int>::ArgMin(vals);
json_str_buf << "{\"min_value\":" << vals[min_idx] << ",";
json_str_buf << "\"max_value\":" << vals[max_idx] << ",";
json_str_buf << "\"values\":[" << CommonC::Join(vals, ",") << "]}";
} else { // unused feature

@no-response
Copy link

no-response bot commented Aug 10, 2021

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@no-response no-response bot closed this as completed Aug 10, 2021
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants