Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: ExtensionDtype._is_numeric #22345

Merged
merged 8 commits into from
Aug 20, 2018

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Aug 14, 2018

closes #22290

split from #22325

It's not clear what else we should be testing, since I'm not sure what all uses Block.is_numeric.

cc @jschendel

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 14, 2018
@codecov
Copy link

codecov bot commented Aug 15, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@b6e35ff). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #22345   +/-   ##
=========================================
  Coverage          ?   92.05%           
=========================================
  Files             ?      169           
  Lines             ?    50715           
  Branches          ?        0           
=========================================
  Hits              ?    46685           
  Misses            ?     4030           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.46% <100%> (?)
#single 42.25% <50%> (?)
Impacted Files Coverage Δ
pandas/core/dtypes/base.py 92.68% <100%> (ø)
pandas/core/arrays/integer.py 94.71% <100%> (ø)
pandas/core/internals/blocks.py 93.84% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6e35ff...e813855. Read the comment docs.

Copy link
Member

@jschendel jschendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM unless my comment illuminates anything.

I've also pushed one additional test that I wrote for DataFrame._get_numeric_data, which is what I believe is called by all DataFrame methods that filter to numeric dtypes.

if newb.shape != self.shape:
# use values.shape, rather than newb.shape, as newb.shape
# may be incorrect for ExtensionBlocks.
if values.shape != self.shape:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ran into this same issue when I was testing out a similar implementation.

I got around it by passing the argument ndim=self.ndim to the make_block function on line 664 above, which seems to get the job done and passes all relevant tests, though I didn't run the entire test suite. Not familiar enough with this code though to say if that's a better (or even adequate) workaround.

For what it's worth, it looks like this ndim inference in NonConsolidatableMixIn is where things start to go wrong, at least for the length 1 Series case:

# Maybe infer ndim from placement
if ndim is None:
if len(placement) != 1:
ndim = 1
else:
ndim = 2

Again, not familiar enough with this code to immediately see a fix, but maybe this is helpful to someone with more knowledge of this code than me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar enough with this code though to say if that's a better (or even adequate) workaround.

That seems better.


By default ExtensionDtypes are assumed to be non-numeric.
They'll be excluded from operations that exclude non-numeric
columns, like groupby reductions, plotting, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put 'gropuby' in parantheses, as the same holds for normal reductions (in a dataframe)

df = pd.DataFrame({"A": [1, 1, 2, 2, 3, 3, 1, 4],
"B": data_for_grouping,
"C": [1, 1, 1, 1, 1, 1, 1, 1]})
result = df.groupby("A").sum().columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this test pass? (I thought reductions did not yet work for EAs?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's going through the fallback np.mean(arr)? Will investigate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so reductions for Series fail (no _reduce), but for dataframe it seems to use some ndarray fallback


def test_is_numeric_honored(self, data):
result = pd.Series(data)
assert result._data.blocks[0].is_numeric is data.dtype._is_numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can also test df._get_numeric_data ?

It would be nice if this would also work for df.select_dtypes, but I suppose that is yet another issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can also test df._get_numeric_data ?

ah, that is done below

@jreback jreback added this to the 0.24.0 milestone Aug 16, 2018
@jreback
Copy link
Contributor

jreback commented Aug 16, 2018

lgtm. @jorisvandenbossche not sure if you comments are resolved? if so, pls merge away.

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche all good?

@jorisvandenbossche jorisvandenbossche merged commit 513c02c into pandas-dev:master Aug 20, 2018
@jorisvandenbossche
Copy link
Member

Thanks!

@TomAugspurger TomAugspurger deleted the ea-is-numeric branch August 20, 2018 14:02
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExtensionBlock.is_numeric is always False
4 participants