Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature_fraction doesnt picks last feature #4476

Closed
draphi opened this issue Jul 15, 2021 · 3 comments
Closed

feature_fraction doesnt picks last feature #4476

draphi opened this issue Jul 15, 2021 · 3 comments

Comments

@draphi
Copy link

draphi commented Jul 15, 2021

Description

When feature fraction is set to a small value (e.g. 0.6), the last feature in the data set never gets selected.
I was expecting that the feature subset is re-evaluated for each iteration.
In general, a feature fraction that is not close to 1 has some feature_importances==0 for my data-set, but not necessarily in the last position. I suspect there is a bug, either in the description of how this parameter works or the implementation

Reproducible example

I am creating three random features, only the last one is used in the response but it will not get picked up unless
the feature fraction gets increased.
this is visible from feature importance = 0 and from the response plot.
Please increase feature-fraction or change the order of the predictors (e.g. all_preds = ["z", "x", "y"]) to see how the model suddenly learns the function.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(123)
param= {'booster' : 'gbdt', 'num_boost_round' : 300, 'bagging_fraction':.90,'feature_fraction_seed':10, 'min_data_in_leaf':1,'bagging_freq':3,'objective' : 'regression', 'feature_fraction':0.3
        ,'verbose':10, 'seed':1223}
all_preds=['x', 'y', 'z']
N = 100
x1 = np.random.randn(N)
x2 = np.random.rand(N)
x3 = np.random.rand(N)
df = pd.DataFrame({'x': x1, 'y': x2, 'z':(x3)})
df[response_column] = np.cos(x3)
ds = lgb.Dataset( df[all_preds], label = df[response_column], feature_name       = all_preds)

param_no_rounds = {k:v for k,v in param.items() if k!='num_boost_round'}
mdl = lgb.train(param_no_rounds, ds, num_boost_round=param['num_boost_round'], feature_name=all_preds, verbose_eval=1)
print(list(zip(all_preds,mdl.feature_importance())))
plt.plot(df['z'], mdl.predict(df[all_preds]),'.')
plt.plot(df['z'], df[response_column],'r.')

Environment info

Release version:
pip install lightgbm==3.2.1

Additional Comments

@StrikerRUS
Copy link
Collaborator

Hey @draphi ! Thanks a lot for posting this issue with detailed reproducible example!

I can confirm that one feature is unused in 3.2.1 version. But I think this issue has been fixed in master via #4450. Also linking #4371 as the same issue.

Here is what I get with nightly build of LightGBM:

image

@draphi Please try the latest version: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#nightly-builds.

@no-response
Copy link

no-response bot commented Aug 16, 2021

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@no-response no-response bot closed this as completed Aug 16, 2021
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants