-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP010 n_features_in_ attribute #22
Changes from 1 commit
354a6a0
df083c4
ecff33d
08630ed
5a247e7
732dc34
f26bc32
78a0d8e
8d4ccb6
593e92c
9cee1c9
2f37147
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,81 @@ | ||||||
.. _slep_010: | ||||||
|
||||||
================================= | ||||||
SLEP010: n_features_in_ attribute | ||||||
================================= | ||||||
|
||||||
:Author: Nicolas Hug | ||||||
:Status: Under review | ||||||
:Type: Standards Track | ||||||
:Created: 2019-11-23 | ||||||
|
||||||
Abstract | ||||||
######## | ||||||
|
||||||
This SLEP proposes the introduction of a public ``n_features_in_`` attribute | ||||||
for most estimators (where relevant). This attribute is automatically set | ||||||
when calling ``_validate_X()`` or ``_validate_X_y`` which are meant to replace | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
``check_array`` and ``check_X_y`` (they are still called under the hood). | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Motivation | ||||||
########## | ||||||
|
||||||
Knowing the number of features that an estimator expects is useful for | ||||||
inspection purposes, as well as for input validation. | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
The logic that is proposed here (calling a stateful method instead of a | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
stateless function) is also a pre-requisit to fixing the dataframe column | ||||||
ordering issue: at the moment, there is no way to raise an error if the column | ||||||
ordering of a dataframe was changed between ``fit`` and ``predict``. | ||||||
|
||||||
Solution | ||||||
######## | ||||||
|
||||||
The proposed solution is to replace most calls to ``check_X()`` or | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
``check_X_y()`` by calls to two newly created private methods:: | ||||||
|
||||||
def _validate_X(self, X, check_n_features=False, **check_array_params) | ||||||
... | ||||||
|
||||||
def _validate_X_y(self, X, check_n_features=False, **check_X_y_params) | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
... | ||||||
|
||||||
The ``_validate_XXX()`` methods will call the corresponding ``check_XXX()`` | ||||||
functions. | ||||||
|
||||||
The ``check_n_features`` parameter is False by default and can be set to True | ||||||
to raise an error when ``self.n_features_in_ != X.shape[1]``. The idea is to | ||||||
leave it to False in ``fit()`` and set it to True in ``predict()`` or | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we instead have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give an example that would handle both Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Forgot about this point. I'm -.5 on having If we want to do two methods There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it rather I think the point is that where we really want the two validation methods to differ is whether they are storing state or checking against state. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thoughts here? Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry this was hidden for some reason. I think |
||||||
``transform()``. | ||||||
|
||||||
In most cases, the attribute exists only once ``fit`` has been called, but | ||||||
there are exceptions (see below). | ||||||
|
||||||
A new common check is added: it makes sure that for most esitmators, the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
``n_features_in_`` attribute does not exist until ``fit`` is called, and | ||||||
that its value is correct. | ||||||
|
||||||
Considerations | ||||||
############## | ||||||
|
||||||
The main consideration is that the addition of the common test means that | ||||||
existing estimators in downstream libraries will not pass our test suite, | ||||||
unless they update their calls to ``check_XXX`` into calls to | ||||||
``_validate_XXX``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can also have a deprecation period for this test in estimator checks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not really true / the point, right? They don't need to use our private methods, though they can. They "just" need to provide |
||||||
|
||||||
There are other minor considerations: | ||||||
|
||||||
- In most meta-estimators, the input validation is handled by the | ||||||
sub-estimator(s). The ``n_features_in_`` attribute of the meta-estimator | ||||||
is thus explicitly set to that of the sub-estimator, either via a | ||||||
``@property``, or directly in ``fit()``. | ||||||
- Some estimators like the dummy estimators do not validate the input | ||||||
(the 'no_validation' tag should be True). The ``n_features_in_`` attribute | ||||||
should be set to None, though this is not enforced in the common tests. | ||||||
- Some estimators expect a non-rectangular input: the vectorizers. These | ||||||
estimators never have a ``n_features_in_`` attribute (they never call | ||||||
``check_array`` anyway). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's no excuse. They should applicable for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand your comment. Are you simply suggesting to make clear that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm saying that you should remove "(they never call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But there should be a way for them to comply with the new requirements and pass |
||||||
- Some estimators may know the number of input features before ``fit`` is | ||||||
called: typically the ``SparseCoder``, where ``n_feature_in_`` is known at | ||||||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
``__init__`` from the ``dictionary`` parameter. In this case the attribute is | ||||||
set in ``__init__``. | ||||||
adrinjalali marked this conversation as resolved.
Show resolved
Hide resolved
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,7 @@ | ||
SLEPs under review | ||
================== | ||
|
||
Nothing here | ||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
slep010/proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is being treated as markup. Put it in
`
or``