Improving synergy between DataFrames and scikit-learn (ML) tools #9

JovanVeljanoski · 2019-09-05T06:42:12Z

There is already a decent synergy between pandas and scikit-learn and most other popular machine learning libraries, as in a pandas DataFrame is almost always accepted as an input data structure.

However, the output of the scikit-learn transformers is a pure numpy array, and thus one loses the column name information of the input data. Preserving the column names through the ML pipeline would be extremely useful to data scientists to optimize/understand/debug data science pipelines.

The text was updated successfully, but these errors were encountered:

toobaz · 2019-09-05T08:18:53Z

I agree. It would also be nice if, when passed Series and DataFrames, scikit-learn ML methods returned Series (or DataFrames when there are multiple labels) with corresponding indexes when predicting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving synergy between DataFrames and scikit-learn (ML) tools #9

Improving synergy between DataFrames and scikit-learn (ML) tools #9

JovanVeljanoski commented Sep 5, 2019

toobaz commented Sep 5, 2019

Improving synergy between DataFrames and scikit-learn (ML) tools #9

Improving synergy between DataFrames and scikit-learn (ML) tools #9

Comments

JovanVeljanoski commented Sep 5, 2019

toobaz commented Sep 5, 2019