-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing Feature Selection inside or before Column Transformer #2
Comments
Hi Maira, This is a good suggestion, and it's something that ideally this class would support, but I don't think that it would be an easy change to support this use case. In my design, I was trying to avoid having to loop through the elements within a ColumnTransformer step because of the complications that could arise. Something would have to change at this point: Let me know if you think of a method to handle this, and I will think about it as well. In the long run, I think that Scikit-Learn will develop some better ways of getting feature names out of pipelines & transformers. |
Here is some of the work being done: scikit-learn/enhancement_proposals#48 |
Hi Kyle, I did some work on this and I got the class working for my current purposes, but that certainly introduced other problems, because it was kind of a dirty hack. I'll fork from your repo and push my changes, then if you want you can have a look and see if it gives you any ideas. What do you say? Cheers! |
Here's my first attempt. I'm sure it creates new problems, because I haven't tested it properly, but it is an idea :) |
I came across another problem, maybe related to #1 , but I'm not sure if this is by design or not.
In a slightly different example than the one before, I created a toy dataframe with one column ("c") that has only null values. I want this column to be dropped inside the Column Transformer Pipeline before imputing (because an all-nan column will be silently dropped by SimpleImputer, so in my opinion it is better to have a step that explicitly does it). So the code below:
returns:
So you can see that column c was dropped from the resulting dataframe, but it is still showing in the list of features.
So, my question is, is there a way to have a Pipeline with a Feature Selection step inside a Column Transformer, or at least as a step before the Column Transformer in the outer Pipeline, to avoid the issue of the silent dropping of all-nan columns by the Imputer?
Thanks!
The text was updated successfully, but these errors were encountered: