ENH: support downcasting of nullable EAs in pd.to_numeric #38746

arw2019 · 2020-12-28T07:22:31Z

closes ENH: Support downcasting of nullable dtypes in to_numeric #33013
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Picking up #33435.

In pd.to_numeric when encountering an IntegerArray and FloatingArray (or Series built from them) we downcast _data then reconstruct the array using the downcast _data and the _mask from the original array.

jorisvandenbossche

Thanks for working on this one!

jorisvandenbossche · 2020-12-28T10:34:38Z

pandas/core/tools/numeric.py


    if isinstance(arg, ABCSeries):
        is_series = True
        values = arg.values
+        if is_extension_array_dtype(arg) and isinstance(values, NumericArray):
+            is_numeric_extension_dtype = True
+            values = extract_array(arg)


Is this extract_array line needed? (the values = arg.values from above should have worked fine, I think)

It's not, I reverted this bit

pandas/tests/tools/test_to_numeric.py

…-to_numeric

jreback · 2020-12-28T18:09:30Z

pandas/core/tools/numeric.py

@@ -142,6 +147,10 @@ def to_numeric(arg, errors="raise", downcast=None):
    else:
        values = arg

+    if is_extension_array_dtype(arg) and isinstance(values, NumericArray):


this really should be part of the above logic. also mask needs to be defined for all cases (can be default to None)

this really should be part of the above logic.

The reason for having it down here is to handle EA dtype Series and array in a single place (and Index when #34159/#37869 go through)

also mask needs to be defined for all cases (can be default to None)

done

jreback · 2020-12-28T18:10:34Z

pandas/core/tools/numeric.py

@@ -142,6 +147,10 @@ def to_numeric(arg, errors="raise", downcast=None):
    else:
        values = arg

+    if is_extension_array_dtype(arg) and isinstance(values, NumericArray):
+        is_numeric_extension_dtype = True
+        mask, values = values._mask, values._data


you likely need to just take the masked values only for the following block of code

jreback · 2020-12-28T18:10:50Z

pandas/core/tools/numeric.py

@@ -188,6 +197,16 @@ def to_numeric(arg, errors="raise", downcast=None):
                    if values.dtype == dtype:
                        break

+    if is_numeric_extension_dtype:


L194 might need to handle the mask

Do you mean?

float_32_ind = typecodes.index(float_32_char)

I feel like there's a testcase I haven't looked at if yes (the ones I have work as is)

pandas/tests/tools/test_to_numeric.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

pandas/core/tools/numeric.py

arw2019 · 2020-12-28T22:50:06Z

Green + addressed comments

jreback

can you also update the doc-string of to_numeric as well as add some examples.

pandas/core/tools/numeric.py

pandas/tests/tools/test_to_numeric.py

jorisvandenbossche · 2020-12-29T07:40:39Z

pandas/tests/tools/test_to_numeric.py

+        ([-1, -1], "Int32", "unsigned", "Int32"),
+        ([1, 1], "Float64", "float", "Float32"),
+        ([1, 1.1], "Float64", "float", "Float32"),
+        ([1, 1], "Float64", "integer", "Int8"),


Nit: maybe move this one up to the other "integer" downcast cases

arw2019 · 2020-12-30T06:09:44Z

pandas/tests/tools/test_to_numeric.py

+        ([-1, -1], "Int32", "unsigned", "Int32"),
+        ([1, 1], "Float64", "float", "Float32"),
+        ([1, 1.1], "Float64", "float", "Float32"),
+        ([1, 1], "Float64", "integer", "Int8"),


pandas/tests/tools/test_to_numeric.py

pandas/core/tools/numeric.py

arw2019 · 2020-12-30T06:40:46Z

pandas/core/tools/numeric.py

@@ -108,6 +110,21 @@ def to_numeric(arg, errors="raise", downcast=None):
    2    2.0
    3   -3.0
    dtype: float64
+


Docstring updated.

…-to_numeric

pandas/core/tools/numeric.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jreback

thanks @arw2019

jreback · 2020-12-30T13:57:18Z

pandas/core/tools/numeric.py

+    2    3
+    dtype: Int8
+    >>> s = pd.Series([1.0, 2.1, 3.0], dtype="Float64")
+    >>> pd.to_numeric(s, downcast="float")


we may want to also accept Float and Integer as aliases for float | integer (separate issue)

I don't think we should do that, since those are not actually dtypes, but rather values to a downcast keyword which eg also accepts "signed"/"unsigned"

pandas/core/tools/numeric.py

jorisvandenbossche · 2020-12-30T14:06:49Z

Thanks @arw2019 !

…#38746)

arw2019 added 3 commits December 28, 2020 02:10

tests

a594847

add NumericArray path in to_numeric

a1bb9fc

replace to_numpy use with _data

dbba7b4

arw2019 changed the title ~~[WIP] ENH: downcasting of nullable EAs in pd.to_numeric~~ [WIP] ENH: support downcasting of nullable EAs in pd.to_numeric Dec 28, 2020

jorisvandenbossche reviewed Dec 28, 2020

View reviewed changes

arw2019 added 3 commits December 28, 2020 11:24

review comment

ddb71cb

more testcases

323cfdc

Merge branch 'master' of https://github.com/pandas-dev/pandas into EA…

7b4180e

…-to_numeric

arw2019 changed the title ~~[WIP] ENH: support downcasting of nullable EAs in pd.to_numeric~~ ENH: support downcasting of nullable EAs in pd.to_numeric Dec 28, 2020

whatsnew

337589a

jreback requested changes Dec 28, 2020

View reviewed changes

arw2019 added 4 commits December 28, 2020 13:50

review comments

4e8761a

cleanup

e2b4cbb

review comments (tests)

23c4ae6

more tests

e140b7a

jorisvandenbossche reviewed Dec 28, 2020

View reviewed changes

pandas/tests/tools/test_to_numeric.py Outdated Show resolved Hide resolved

arw2019 and others added 2 commits December 28, 2020 14:16

de-duplicate testcae

f583a10

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

more cleanup

a6cb152

arw2019 commented Dec 28, 2020

View reviewed changes

pandas/core/tools/numeric.py Show resolved Hide resolved

arw2019 added the Needs Review label Dec 28, 2020

jreback requested changes Dec 28, 2020

View reviewed changes

pandas/core/tools/numeric.py Show resolved Hide resolved

jreback reviewed Dec 28, 2020

View reviewed changes

pandas/tests/tools/test_to_numeric.py Show resolved Hide resolved

jorisvandenbossche approved these changes Dec 29, 2020

View reviewed changes

review: code comments

d707a61

arw2019 added 3 commits December 30, 2020 01:02

review: code comments

6b2c39f

review: update docstring

fda4ba1

review: reorder test parameters

1a23118

arw2019 mentioned this pull request Dec 30, 2020

BUG: Series constructor with nullable unsigned integer dtype fails with large number #38798

Closed

review: add testcases

1015b07

arw2019 commented Dec 30, 2020

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into EA…

0279be9

…-to_numeric

jorisvandenbossche approved these changes Dec 30, 2020

View reviewed changes

pandas/core/tools/numeric.py Outdated Show resolved Hide resolved

pandas/core/tools/numeric.py Show resolved Hide resolved

Update pandas/core/tools/numeric.py

56747da

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jreback approved these changes Dec 30, 2020

View reviewed changes

jreback added this to the 1.3 milestone Dec 30, 2020

jreback merged commit 94810d1 into pandas-dev:master Dec 30, 2020

This was referenced Jan 5, 2021

BUG: pd.to_numeric does not copy _mask for ExtensionArrays #38974

Closed

BUG: pd.to_numeric does not copy _mask for ExtensionArrays #39049

Merged

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

ENH: support downcasting of nullable EAs in pd.to_numeric (pandas-dev…

6ae91c4

…#38746)

simonjayhawkins mentioned this pull request Jun 2, 2021

TST: Make ARM build work (not in the CI) #41739

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support downcasting of nullable EAs in pd.to_numeric #38746

ENH: support downcasting of nullable EAs in pd.to_numeric #38746

arw2019 commented Dec 28, 2020

jorisvandenbossche left a comment

jorisvandenbossche Dec 28, 2020

arw2019 Dec 28, 2020

jreback Dec 28, 2020

arw2019 Dec 28, 2020

jreback Dec 28, 2020

arw2019 Dec 28, 2020

jreback Dec 28, 2020

arw2019 Dec 28, 2020

arw2019 commented Dec 28, 2020

jreback left a comment

jorisvandenbossche Dec 29, 2020

arw2019 Dec 30, 2020

arw2019 Dec 30, 2020

arw2019 Dec 30, 2020

jreback left a comment

jreback Dec 30, 2020

jorisvandenbossche Dec 30, 2020

jorisvandenbossche commented Dec 30, 2020

ENH: support downcasting of nullable EAs in pd.to_numeric #38746

ENH: support downcasting of nullable EAs in pd.to_numeric #38746

Conversation

arw2019 commented Dec 28, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arw2019 commented Dec 28, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 30, 2020