Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving synergy between DataFrames and scikit-learn (ML) tools #9

Open
JovanVeljanoski opened this issue Sep 5, 2019 · 1 comment

Comments

@JovanVeljanoski
Copy link

There is already a decent synergy between pandas and scikit-learn and most other popular machine learning libraries, as in a pandas DataFrame is almost always accepted as an input data structure.

However, the output of the scikit-learn transformers is a pure numpy array, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.

@toobaz
Copy link

toobaz commented Sep 5, 2019

I agree. It would also be nice if, when passed Series and DataFrames, scikit-learn ML methods returned Series (or DataFrames when there are multiple labels) with corresponding indexes when predicting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants