API: ExtensionDtype._is_numeric #22345

TomAugspurger · 2018-08-14T15:37:19Z

split from #22325

It's not clear what else we should be testing, since I'm not sure what all uses Block.is_numeric.

codecov · 2018-08-15T05:18:00Z

Codecov Report

❗ No coverage uploaded for pull request base (master@b6e35ff). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #22345   +/-   ##
=========================================
  Coverage          ?   92.05%           
=========================================
  Files             ?      169           
  Lines             ?    50715           
  Branches          ?        0           
=========================================
  Hits              ?    46685           
  Misses            ?     4030           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.46% <100%> (?)`
#single	`42.25% <50%> (?)`

Impacted Files	Coverage Δ
pandas/core/dtypes/base.py	`92.68% <100%> (ø)`
pandas/core/arrays/integer.py	`94.71% <100%> (ø)`
pandas/core/internals/blocks.py	`93.84% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6e35ff...e813855. Read the comment docs.

jschendel

Thanks! LGTM unless my comment illuminates anything.

I've also pushed one additional test that I wrote for DataFrame._get_numeric_data, which is what I believe is called by all DataFrame methods that filter to numeric dtypes.

jschendel · 2018-08-15T05:31:22Z

pandas/core/internals/blocks.py

-            if newb.shape != self.shape:
+            # use values.shape, rather than newb.shape, as newb.shape
+            # may be incorrect for ExtensionBlocks.
+            if values.shape != self.shape:


Yeah, I ran into this same issue when I was testing out a similar implementation.

I got around it by passing the argument ndim=self.ndim to the make_block function on line 664 above, which seems to get the job done and passes all relevant tests, though I didn't run the entire test suite. Not familiar enough with this code though to say if that's a better (or even adequate) workaround.

For what it's worth, it looks like this ndim inference in NonConsolidatableMixIn is where things start to go wrong, at least for the length 1 Series case:

pandas/pandas/core/internals/blocks.py

Lines 1780 to 1785 in cf70d11

# Maybe infer ndim from placement

if ndim is None:

if len(placement) != 1:

ndim = 1

else:

ndim = 2

Again, not familiar enough with this code to immediately see a fix, but maybe this is helpful to someone with more knowledge of this code than me?

Not familiar enough with this code though to say if that's a better (or even adequate) workaround.

That seems better.

… into ea-is-numeric

jorisvandenbossche · 2018-08-15T14:06:14Z

pandas/core/dtypes/base.py

+
+        By default ExtensionDtypes are assumed to be non-numeric.
+        They'll be excluded from operations that exclude non-numeric
+        columns, like groupby reductions, plotting, etc.


maybe put 'gropuby' in parantheses, as the same holds for normal reductions (in a dataframe)

jorisvandenbossche · 2018-08-15T14:10:05Z

pandas/tests/extension/base/groupby.py

+        df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4],
+                           "B": data_for_grouping,
+                           "C": [1, 1, 1, 1, 1, 1, 1, 1]})
+        result = df.groupby("A").sum().columns


how does this test pass? (I thought reductions did not yet work for EAs?)

I wonder if it's going through the fallback np.mean(arr)? Will investigate.

Yeah, so reductions for Series fail (no _reduce), but for dataframe it seems to use some ndarray fallback

jorisvandenbossche · 2018-08-15T14:11:57Z

pandas/tests/extension/base/interface.py

+
+    def test_is_numeric_honored(self, data):
+        result = pd.Series(data)
+        assert result._data.blocks[0].is_numeric is data.dtype._is_numeric


Maybe you can also test df._get_numeric_data ?

It would be nice if this would also work for df.select_dtypes, but I suppose that is yet another issue

Maybe you can also test df._get_numeric_data ?

ah, that is done below

jreback · 2018-08-16T10:45:32Z

lgtm. @jorisvandenbossche not sure if you comments are resolved? if so, pls merge away.

TomAugspurger · 2018-08-20T11:04:19Z

@jorisvandenbossche all good?

jorisvandenbossche · 2018-08-20T11:18:16Z

Thanks!

API: ExtensionDtype._is_numeric

5064217

TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 14, 2018

TomAugspurger mentioned this pull request Aug 14, 2018

Reductions for ExtensionArray #22346

Closed

fixed test

50de326

added test for DataFrame._get_numeric_data

1d96d22

jschendel approved these changes Aug 15, 2018

View reviewed changes

TomAugspurger added 3 commits August 15, 2018 07:37

Pass ndim

db9af36

Note plotting

a3fdc2a

Merge branch 'ea-is-numeric' of https://github.com/TomAugspurger/pandas…

fc34131

… into ea-is-numeric

jorisvandenbossche reviewed Aug 15, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Aug 16, 2018

jorisvandenbossche mentioned this pull request Aug 16, 2018

API: dispatch to EA.astype #22343

Merged

small edit

2779419

jorisvandenbossche approved these changes Aug 20, 2018

View reviewed changes

Merge branch 'master' into ea-is-numeric

e813855

jorisvandenbossche merged commit 513c02c into pandas-dev:master Aug 20, 2018

TomAugspurger deleted the ea-is-numeric branch August 20, 2018 14:02

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

API: ExtensionDtype._is_numeric (pandas-dev#22345)

0c2001a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: ExtensionDtype._is_numeric #22345

API: ExtensionDtype._is_numeric #22345

TomAugspurger commented Aug 14, 2018 •

edited

Loading

codecov bot commented Aug 15, 2018 •

edited

Loading

jschendel left a comment •

edited

Loading

jschendel Aug 15, 2018

TomAugspurger Aug 15, 2018

jorisvandenbossche Aug 15, 2018

jorisvandenbossche Aug 15, 2018

TomAugspurger Aug 15, 2018

jorisvandenbossche Aug 15, 2018

jorisvandenbossche Aug 15, 2018

jorisvandenbossche Aug 15, 2018

jreback commented Aug 16, 2018

TomAugspurger commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

	# Maybe infer ndim from placement
	if ndim is None:
	if len(placement) != 1:
	ndim = 1
	else:
	ndim = 2

API: ExtensionDtype._is_numeric #22345

API: ExtensionDtype._is_numeric #22345

Conversation

TomAugspurger commented Aug 14, 2018 • edited Loading

codecov bot commented Aug 15, 2018 • edited Loading

Codecov Report

jschendel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 16, 2018

TomAugspurger commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

TomAugspurger commented Aug 14, 2018 •

edited

Loading

codecov bot commented Aug 15, 2018 •

edited

Loading

jschendel left a comment •

edited

Loading