Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python][scikit-learn] new stacking tests and make number of features a property #3173

Merged
merged 19 commits into from
Jun 27, 2020
Merged

[python][scikit-learn] new stacking tests and make number of features a property #3173

merged 19 commits into from
Jun 27, 2020

Conversation

a-wozniakowski
Copy link
Contributor

In python_package_test, two stacking tests for classification and regression check LightGBM 's stacking layer compatibility (which is related to #17353).

Also, n_features_in_ has become a property, which updates #3129.

@ghost
Copy link

ghost commented Jun 18, 2020

CLA assistant check
All CLA requirements met.

@a-wozniakowski
Copy link
Contributor Author

TASK=check-docs failed similarly in PR #3144, so it is a prior issue.

@StrikerRUS
Copy link
Collaborator

@a-wozniakowski

Also, n_features_in_ has become a property, which updates #3129.

Please clarify why was it needed? Refer to #3129 (review).

@a-wozniakowski
Copy link
Contributor Author

@StrikerRUS

@a-wozniakowski

Also, n_features_in_ has become a property, which updates #3129.

Please clarify why was it needed? Refer to #3129 (review).

The change leaves n_features_ intact, and it expresses n_features_in_ in the same convention as the other attributes. Thus improving readability.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jun 19, 2020

@a-wozniakowski Please note that making n_features_in_ as property cannot pass the scikit-learn API test. Refer to

Also, it seems that the current implementation doesn't pass API test and fails at this line:
https://github.com/scikit-learn/scikit-learn/blob/7cc815c21c60dabe8252b30ada77621b88e2136c/sklearn/utils/estimator_checks.py#L2946
I guess that n_features_in_ should be set as an attribute in fit() method, not as property:

Make sure that n_features_in_ attribute doesn't exist until fit is called

Please revert this and leave only new tests in this PR.

@a-wozniakowski
Copy link
Contributor Author

@StrikerRUS I'm fine to revert the code, but I don't see the issue: check_n_features_in in the API test has a line that asserts
not hasattr(estimator, 'n_estimators_in_') prior to fitting the cloned estimator. This condition is True in the tests that I've done, e.g.,

import lightgbm as lgb
from sklearn.datasets import load_boston, load_iris
from sklearn.ensemble import StackingRegressor, StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn.utils.estimator_checks import check_n_features_in

# regression tests
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# test single regressor
reg = lgb.LGBMRegressor()
name = reg.__class__.__name__
check_n_features_in(name, reg)
# test stacking regressor
regressors = [('gbm1', lgb.LGBMRegressor()),
              ('gbm2', lgb.LGBMRegressor())]
reg = StackingRegressor(estimators=regressors,
                        final_estimator=lgb.LGBMRegressor())
name = reg.__class__.__name__
check_n_features_in(name, reg)

# classification tests
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# test single classifier
clf = lgb.LGBMClassifier()
name = clf.__class__.__name__
check_n_features_in(name, clf)
# test stacking classifier
classifiers = [('gbm1', lgb.LGBMClassifier()),
               ('gbm2', lgb.LGBMClassifier())]
clf = StackingClassifier(estimators=classifiers,
                         final_estimator=lgb.LGBMClassifier())
name = clf.__class__.__name__
check_n_features_in(name, clf)

If one tries to access n_features_in_ prior to fitting the model, then there is a LGBMNotFittedError error. Hence, hasattr(estimator, 'n_estimators_in_') is False in the test.

@StrikerRUS
Copy link
Collaborator

@a-wozniakowski

If one tries to access n_features_in_ prior to fitting the model, then there is a LGBMNotFittedError error. Hence, hasattr(estimator, 'n_estimators_in_') is False in the test.

Ah, I see! That test can be passed due to the NotFittedError, because with other ordinary exceptions assert is failing. Thanks a lot for your investigation!
https://github.com/scikit-learn/scikit-learn/blob/ec1070a21213ce855f83e20de46a42e3cc765568/sklearn/exceptions.py#L19

Please rebase this PR to current master and update docstring according to the new policy from #3164.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-wozniakowski Thanks for the updates!

Please see some initial review comments below:

tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
@StrikerRUS StrikerRUS changed the title new stacking tests and make number of features a property [python][scikit-learn] new stacking tests and make number of features a property Jun 26, 2020
Update stacking tests for review comments.
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
tests/python_package_test/test_sklearn.py Outdated Show resolved Hide resolved
@a-wozniakowski
Copy link
Contributor Author

@StrikerRUS thanks for the feedback.

#3192 encountered the same issue in continuous-integration/travis-ci/pr. Namely, TASK=check-docs failed.

@StrikerRUS
Copy link
Collaborator

Namely, TASK=check-docs failed.

Seems it was something different. I re-run that job for this PR and everything is OK now.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-wozniakowski Thank you very much for your contribution!

@StrikerRUS StrikerRUS merged commit 7284946 into microsoft:master Jun 27, 2020
@a-wozniakowski a-wozniakowski deleted the a-wozniakowski-stacking-update branch July 27, 2020 12:32
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants