-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataSet.to_numpy()
should use numpy dtypes whenever possible
#182
Conversation
DataSet.to_numpy()
should use numpy dtypes whenever possible
Codecov Report
@@ Coverage Diff @@
## main #182 +/- ##
==========================================
+ Coverage 98.36% 98.37% +0.01%
==========================================
Files 45 45
Lines 1772 1783 +11
==========================================
+ Hits 1743 1754 +11
Misses 29 29
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. I wish we also handled the case where MTZDtype
is mixed with Pandas float
dtypes. I'm not sure that can be implemented in a way that doesn't cause nasty side effects.
Pandas DataFrames that contain ExtensionDtypes always default to output data with
object
dtype whenDataFrame.to_numpy()
is called. This is suboptimal for MTZ data, which by construction must be compatible withfloat32
, and possiblyint32
.This PR wraps the pandas call with
DataSet.to_numpy()
to assess whether a more sensible default (eitherfloat32
orint32
) can be used based on the existing data. This should help to avoid cases where data is unnecessarily cast to anobject
array, which can lead to unexpected behavior downstream.