You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are 2 numerical features: a, r
There are other categorical features: 'a', 'v', 's', 'a', 'r', 'q', 'w', 'e', 'r', 't', 'y', 'u'
There is a significant difference between the feature importance when:
Case 1: valid_features = valid_feat_categorical + valid_feat_numerical
Case 2: valid_features = valid_feat_numerical + valid_feat_categorical
Feature importance for Case 1 : valid_features = valid_feat_categorical + valid_feat_numerical
Feature importance for Case 2: valid_features = valid_feat_numerical + valid_feat_categorical
Question: Does changing the order of features change the feature importance so drastically?
Note: In the first case, ad_keyword_ed was on top (which is a categorical feature) , in the second case rate_acdk was on top (which is a numerical feature)
Does changing the order of features change the feature importance so drastically?
It's not common, but yes it is possible. For example, if 2 features are very similar then they may offer very similar explanatory power, and LightGBM will tie-break by choosing the one that appears earlier in the column order: #1294 (comment).
But I strongly suspect that the difference you're observing is mostly attributable to randomness between training runs.
Try running your code twice consecutively with 0 changes to the feature order, and checking whether the models produced are identical. If they aren't, you aren't yet controlling for randomness and need to address those issues before you can investigate these feature-importance changes.
If you can provide a minimal and reproducible answer, we might be able to help more. Right now you've omitted significant details, like the definition of the prep_data() function or how you are doing train-test splitting.
If you can, please consider updating to lightgbm>=4.3.0. v3.1.0 is about 3.5 years old (link) and there have been significant changes and improvements to this project since then.
jameslamb
changed the title
Difference between feature importance while changing the order of features
[python-package] Difference between feature importance while changing the order of features
Mar 16, 2024
Description
There are 2 numerical features:
a, r
There are other categorical features:
'a', 'v', 's', 'a', 'r', 'q', 'w', 'e', 'r', 't', 'y', 'u'
There is a significant difference between the feature importance when:
Case 1:
valid_features = valid_feat_categorical + valid_feat_numerical
Case 2:
valid_features = valid_feat_numerical + valid_feat_categorical
Feature importance for Case 1 :
valid_features = valid_feat_categorical + valid_feat_numerical
Feature importance for Case 2:
valid_features = valid_feat_numerical + valid_feat_categorical
Question: Does changing the order of features change the feature importance so drastically?
Note: In the first case, ad_keyword_ed was on top (which is a categorical feature) , in the second case rate_acdk was on top (which is a numerical feature)
Reproducible example
While training the model I am using the function:
Environment info
LightGBM version or commit hash:
Version: 3.1.0
The text was updated successfully, but these errors were encountered: