Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP015: Feature Names Propagation #48

Merged
5 changes: 4 additions & 1 deletion slep015/proposal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,10 @@ Considerations and Limitations
a pipeline with no steps. We can work around this by allowing pipelines
with no steps.

3. Meta-estimators will delegate the setting and validation of
3. ``feature_names_in_`` can be any 1-D ``Sequence``, such as an list or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ndarray is not a sequence: numpy/numpy#2776

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Iterable that returns a string" would be enough.

In our discussions, I think we want to make sure the feature names are strings.

Copy link
Member

@jnothman jnothman Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. We'd better accept Sequences and 1d array-likes whose elements are strings: pd.Index is not a Sequence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. ``feature_names_in_`` can be any 1-D ``Sequence``, such as an list or
3. ``feature_names_in_`` can be any 1d array-like of strings, such as an list or

an ndarray.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth noting that this allowance can avoid unnecessary memory consumption/copies, with reduced implementation complexity, although it may reduce usability a bit.


4. Meta-estimators will delegate the setting and validation of
``feature_names_in_`` to its inner estimators. The meta-estimator will
define ``feature_names_in_`` by referencing its inner estimators. For
example, the ``Pipeline`` can use ``steps[0].feature_names_in_`` as
Expand Down