Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/WIP Feature names within fit #14238

Closed
wants to merge 4 commits into from

Conversation

amueller
Copy link
Member

@amueller amueller commented Jul 2, 2019

This is an alternative to #12627 that I proposed in scikit-learn/enhancement_proposals#18

Basically I think now it's better to have the feature names as close to X as possible, so they are not out of sync, and I want the user interface to be as small as possible.

This PR adds feature_names_in as a parameter to fit, and adds feature_names_in_ as an attribute to every estimator, and feature_names_out_ as an attribute to all transformers.

Other alternatives that do basically the same but don't require an attribute to fit are:

  • require the user to set the feature_names_in attribute manually
  • pass around objects that have feature names attached to X, i.e. use dataframes or a dataset object or a subclass of ndarray that adds feature names.

@amueller
Copy link
Member Author

amueller commented Jul 2, 2019

this lets you trace the change of features within a pipeline / column transformer quite nicely
image

@jnothman
Copy link
Member

jnothman commented Jul 2, 2019

What do you think of using some ugly naming convention, like 'ends with a _' to indicate fit parameters that are not sample-aligned. I.e. fit(self, X, y, feature_names_) and ``partial_fit(self, X, y, classes_, feature_names_)`

@amueller
Copy link
Member Author

amueller commented Jul 26, 2021

closing this, I think I prefer passing around a different datastructure if we want to have feature names in fit. A simpler first fix is to do get_feature_names and not have them during fit. See #18444 for the most up-to-date solution on this issue.

@amueller amueller closed this Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision Requires decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants