[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249

JakaKokosar · 2022-12-09T09:52:31Z

Issue

Description of changes

Includes

Code changes
Tests
Documentation

pavlin-policar · 2022-12-09T13:36:51Z

Orange/projection/pca.py

@@ -201,7 +201,7 @@ def _fit_truncated(self, X, n_components, svd_solver):
                random_state=random_state,
            )

-        self.n_samples_, self.n_features_ = n_samples, n_features
+        self.n_samples_ = n_samples


I think this works because:

Most of the code in ImprovedPCA is copied directly from sklearn.decomposition.PCA. This was done because it wasn't possible to add the randomized algorithm that enables PCA on sparse data without modifying these functions.
Until recently, sklearn's PCA used self.n_features_. However, they recently switched over to using self.n_features_in, which is more generally used in sklearn, and is part of their BaseEstimator class. The self.n_features_in attribute is set in BaseEstimator._check_n_features, which is, in turn, called by BaseEstimator._validate_data. In scikit-learn's PCA, this method is called in PCA._fit, which ensures that the self.n_features_in attribute is set on sklearn's implementation.

They also deprecated self.n_features and replaced it with a readonly property which just returns self.n_featuers_in anyways. This is still the same functionality as before, but you have to trace it through these different methods. So, this is still backwards compatible.

So, we can do the same thing in our ImprovedPCA class. We can get rid of self.n_features_ = features, since this will be set when we call self._validate_data, which is inherited from sklearn.decomposition.PCA <- sklearn.BaseEstimator. So, the other change we need is to replace our previous call to check_array to self._validate_data. And I believe this should ensure the exact same functionality as before.

The transform method should also probably be updated to reflect how sklearn is doing things now.

So, this would mean changing

X = check_array( X, accept_sparse=["csr", "csc"], dtype=[np.float64, np.float32], ensure_2d=True, copy=self.copy, )

on lines 224-230 to

X = self._validate_data( X, dtype=[np.float64, np.float32], reset=False, accept_sparse=["csr", "csc"], )

@pavlin-policar, thanks for the explanation.

markotoplak · 2022-12-09T19:52:08Z

Before merging this please ensure that new code also properly runs with the oldest supported scikit-learn. The -oldest tests do not fail, so it likely works, but double-check, please.

For now, I would avoid raising the dependency to 1.2. I am OK with raising it to 1.1 if needed.

markotoplak · 2022-12-09T20:21:52Z

I couldn't wait. :) Please, add a commit that reverts #6255 when merging.

markotoplak · 2023-04-14T09:03:34Z

/rebase

n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_

pavlin-policar · 2023-04-21T06:15:06Z

The PCA changes in this PR look good to me, but tests related to AdaBoost are failing somewhat consistently. Could this be due to a change in the newer version of scikit-learn? If so, we probably want to include that here as well.

markotoplak · 2023-04-21T12:02:15Z

The remaining tests are failing because of scikit-learn/scikit-learn/issues/26241. I also intend to do a PR that fixes the issue.

I'll remove the last commit (allowing scikit-learn 1.2) and merge this as is and then we can fix adaboost separately.

codecov · 2023-04-21T12:43:50Z

Codecov Report

Merging #6249 (a823c86) into master (d248136) will increase coverage by 0.03%.
The diff coverage is 94.91%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6249      +/-   ##
==========================================
+ Coverage   87.61%   87.65%   +0.03%     
==========================================
  Files         321      321              
  Lines       69024    69148     +124     
==========================================
+ Hits        60475    60610     +135     
+ Misses       8549     8538      -11

JakaKokosar marked this pull request as draft December 9, 2022 10:04

BlazZupan requested a review from pavlin-policar December 9, 2022 11:24

JakaKokosar force-pushed the scikit-learn-api-change branch from 041a39b to 6a86944 Compare December 9, 2022 12:43

pavlin-policar reviewed Dec 9, 2022

View reviewed changes

JakaKokosar force-pushed the scikit-learn-api-change branch 2 times, most recently from 09939d3 to faba3cd Compare December 9, 2022 15:12

JakaKokosar marked this pull request as ready for review December 9, 2022 15:12

JakaKokosar force-pushed the scikit-learn-api-change branch from faba3cd to 0daf7b9 Compare December 9, 2022 15:39

markotoplak mentioned this pull request Dec 9, 2022

Avoid scikit-learn>=1.2.0 due to changes in PCA #6255

Merged

JakaKokosar marked this pull request as draft December 9, 2022 22:13

janezd assigned pavlin-policar Jan 6, 2023

JakaKokosar force-pushed the scikit-learn-api-change branch from 0daf7b9 to 571836b Compare March 17, 2023 10:03

JakaKokosar marked this pull request as ready for review March 17, 2023 10:04

JakaKokosar force-pushed the scikit-learn-api-change branch from 571836b to 9fc253a Compare March 31, 2023 08:48

markotoplak added this to the 3.35.0 milestone Mar 31, 2023

JakaKokosar force-pushed the scikit-learn-api-change branch from 9fc253a to a5a4a10 Compare April 7, 2023 18:33

pca: avoid "AttributeError: can't set attribute" in ImprovedPCA

a823c86

n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_

biolab-helper force-pushed the scikit-learn-api-change branch from a5a4a10 to 499dc30 Compare April 14, 2023 09:04

janezd assigned markotoplak Apr 21, 2023

markotoplak force-pushed the scikit-learn-api-change branch from 499dc30 to a823c86 Compare April 21, 2023 12:26

markotoplak merged commit a45fce1 into biolab:master Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249

[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249

JakaKokosar commented Dec 9, 2022 •

edited

Loading

pavlin-policar Dec 9, 2022

pavlin-policar Dec 9, 2022 •

edited

Loading

markotoplak Dec 9, 2022

markotoplak commented Dec 9, 2022

markotoplak commented Dec 9, 2022

markotoplak commented Apr 14, 2023

pavlin-policar commented Apr 21, 2023

markotoplak commented Apr 21, 2023

codecov bot commented Apr 21, 2023

[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249

[FIX] pca: n_features_ attribute of decomposition.PCA is deprecated in favor of n_features_in_ #6249

Conversation

JakaKokosar commented Dec 9, 2022 • edited Loading

Issue

Description of changes

Includes

pavlin-policar Dec 9, 2022

Choose a reason for hiding this comment

pavlin-policar Dec 9, 2022 • edited Loading

Choose a reason for hiding this comment

markotoplak Dec 9, 2022

Choose a reason for hiding this comment

markotoplak commented Dec 9, 2022

markotoplak commented Dec 9, 2022

markotoplak commented Apr 14, 2023

pavlin-policar commented Apr 21, 2023

markotoplak commented Apr 21, 2023

codecov bot commented Apr 21, 2023

Codecov Report

JakaKokosar commented Dec 9, 2022 •

edited

Loading

pavlin-policar Dec 9, 2022 •

edited

Loading