-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Importance changing drastically with shuffling of data in Lightgbm binary classifier. #5887
Comments
Thanks for using LightGBM. 1. The parameters you're using introduce the possibility of randomness.
These all lead LightGBM to randomly sample from the rows and columns during training. Setting Suggestion: Either set these to 2. Because you're using
|
@jameslamb Thanks for reply!! I also ran the same experiment with these set of hyperparams in same environment setup. Hyperparams: |
I don't understand your response @sahilkgit . Do you still need help? |
Yes, as you @jameslamb mentioned earlier, My argument is based on your first response, as you suggested to set so I am not clear, what is the cause of different models (complete different feature importance) when i am using first set of hyperparams but not facing this in second set of hyperparams? |
It isn't guaranteed that you'll get a different model if you don't follow every one of my suggestions from #5887 (comment). The impact of those different settings is dependent on the size and distribution of your training data. Without a reproducible example (code + data that exactly demonstrates the behavior you're seeing), there's not much else we can do to help here. By providing only parameters and your subjective judgment of models' feature importance being "similar" or not, you are asking for a significant amount of guessing by myself and others trying to help. |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description:
Model Feature importance change drastically after I shuffle the training data.
How I observed this behaviour:
environment details: python - 3.8, pandas - 1.2.4, numpy - 1.19.2 lightgbm - 3.2.1, machine - ubuntu.
Hyper params used:
n_estimators=541, num_leaves=592, colsample_bytree=0.52, min_data_in_leaf=50, min_split_gain=0.00005, bagging_fraction=0.978, lambda_l1=0.31, lambda_l2=0.4, cat_l2=0.18, max_cat_threshold=225, cat_smooth=120, max_depth=21, min_data_per_group=100, learning_rate=0.0911, min_child_weight=0.00029, metric=["binary_logloss"], boosting_type="gbdt", random_state=42, n_jobs=24, verbose=-1, objective="binary", boost_from_average=True, min_data_in_bin=80, max_bin=100, bagging_freq=3, feature_fraction_bynode=0.278107895754091, bin_construct_sample_cnt=0.752595265801746 * train_data.shape[0]
Feature Importance - m1
Feature Importance - m2
The text was updated successfully, but these errors were encountered: