-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: ExtensionDtype._is_numeric #22345
API: ExtensionDtype._is_numeric #22345
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22345 +/- ##
=========================================
Coverage ? 92.05%
=========================================
Files ? 169
Lines ? 50715
Branches ? 0
=========================================
Hits ? 46685
Misses ? 4030
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM unless my comment illuminates anything.
I've also pushed one additional test that I wrote for DataFrame._get_numeric_data
, which is what I believe is called by all DataFrame
methods that filter to numeric dtypes.
pandas/core/internals/blocks.py
Outdated
if newb.shape != self.shape: | ||
# use values.shape, rather than newb.shape, as newb.shape | ||
# may be incorrect for ExtensionBlocks. | ||
if values.shape != self.shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I ran into this same issue when I was testing out a similar implementation.
I got around it by passing the argument ndim=self.ndim
to the make_block
function on line 664 above, which seems to get the job done and passes all relevant tests, though I didn't run the entire test suite. Not familiar enough with this code though to say if that's a better (or even adequate) workaround.
For what it's worth, it looks like this ndim
inference in NonConsolidatableMixIn
is where things start to go wrong, at least for the length 1 Series
case:
pandas/pandas/core/internals/blocks.py
Lines 1780 to 1785 in cf70d11
# Maybe infer ndim from placement | |
if ndim is None: | |
if len(placement) != 1: | |
ndim = 1 | |
else: | |
ndim = 2 |
Again, not familiar enough with this code to immediately see a fix, but maybe this is helpful to someone with more knowledge of this code than me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar enough with this code though to say if that's a better (or even adequate) workaround.
That seems better.
pandas/core/dtypes/base.py
Outdated
|
||
By default ExtensionDtypes are assumed to be non-numeric. | ||
They'll be excluded from operations that exclude non-numeric | ||
columns, like groupby reductions, plotting, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put 'gropuby' in parantheses, as the same holds for normal reductions (in a dataframe)
df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4], | ||
"B": data_for_grouping, | ||
"C": [1, 1, 1, 1, 1, 1, 1, 1]}) | ||
result = df.groupby("A").sum().columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this test pass? (I thought reductions did not yet work for EAs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's going through the fallback np.mean(arr)
? Will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so reductions for Series fail (no _reduce
), but for dataframe it seems to use some ndarray fallback
|
||
def test_is_numeric_honored(self, data): | ||
result = pd.Series(data) | ||
assert result._data.blocks[0].is_numeric is data.dtype._is_numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also test df._get_numeric_data
?
It would be nice if this would also work for df.select_dtypes
, but I suppose that is yet another issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also test df._get_numeric_data ?
ah, that is done below
lgtm. @jorisvandenbossche not sure if you comments are resolved? if so, pls merge away. |
@jorisvandenbossche all good? |
Thanks! |
closes #22290
split from #22325
It's not clear what else we should be testing, since I'm not sure what all uses
Block.is_numeric
.cc @jschendel