-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249
[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249
Conversation
041a39b
to
6a86944
Compare
@@ -201,7 +201,7 @@ def _fit_truncated(self, X, n_components, svd_solver): | |||
random_state=random_state, | |||
) | |||
|
|||
self.n_samples_, self.n_features_ = n_samples, n_features | |||
self.n_samples_ = n_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works because:
Most of the code in ImprovedPCA
is copied directly from sklearn.decomposition.PCA
. This was done because it wasn't possible to add the randomized algorithm that enables PCA on sparse data without modifying these functions.
Until recently, sklearn's PCA used self.n_features_
. However, they recently switched over to using self.n_features_in
, which is more generally used in sklearn, and is part of their BaseEstimator
class. The self.n_features_in
attribute is set in BaseEstimator._check_n_features
, which is, in turn, called by BaseEstimator._validate_data
. In scikit-learn's PCA, this method is called in PCA._fit
, which ensures that the self.n_features_in
attribute is set on sklearn's implementation.
They also deprecated self.n_features
and replaced it with a readonly property which just returns self.n_featuers_in
anyways. This is still the same functionality as before, but you have to trace it through these different methods. So, this is still backwards compatible.
So, we can do the same thing in our ImprovedPCA
class. We can get rid of self.n_features_ = features
, since this will be set when we call self._validate_data
, which is inherited from sklearn.decomposition.PCA <- sklearn.BaseEstimator
. So, the other change we need is to replace our previous call to check_array
to self._validate_data
. And I believe this should ensure the exact same functionality as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transform
method should also probably be updated to reflect how sklearn is doing things now.
So, this would mean changing
X = check_array(
X,
accept_sparse=["csr", "csc"],
dtype=[np.float64, np.float32],
ensure_2d=True,
copy=self.copy,
)
on lines 224-230 to
X = self._validate_data(
X,
dtype=[np.float64, np.float32],
reset=False,
accept_sparse=["csr", "csc"],
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavlin-policar, thanks for the explanation.
09939d3
to
faba3cd
Compare
faba3cd
to
0daf7b9
Compare
Before merging this please ensure that new code also properly runs with the oldest supported scikit-learn. The For now, I would avoid raising the dependency to 1.2. I am OK with raising it to 1.1 if needed. |
I couldn't wait. :) Please, add a commit that reverts #6255 when merging. |
0daf7b9
to
571836b
Compare
571836b
to
9fc253a
Compare
9fc253a
to
a5a4a10
Compare
/rebase |
n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_
a5a4a10
to
499dc30
Compare
The PCA changes in this PR look good to me, but tests related to AdaBoost are failing somewhat consistently. Could this be due to a change in the newer version of scikit-learn? If so, we probably want to include that here as well. |
The remaining tests are failing because of scikit-learn/scikit-learn/issues/26241. I also intend to do a PR that fixes the issue. I'll remove the last commit (allowing scikit-learn 1.2) and merge this as is and then we can fix adaboost separately. |
499dc30
to
a823c86
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6249 +/- ##
==========================================
+ Coverage 87.61% 87.65% +0.03%
==========================================
Files 321 321
Lines 69024 69148 +124
==========================================
+ Hits 60475 60610 +135
+ Misses 8549 8538 -11 |
Issue
Fixes #6248
Description of changes
Includes