Change UInt64Index._na_value from 0 to np.nan #18401

jschendel · 2017-11-21T01:14:40Z

Prerequisite for #18300

closes Change UInt64Index._na_value from 0 to np.nan #18398
closes BUG: Index constructor doesn't coerce int-like floats to UInt64Index #18400
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Summary:

Changed UInt64Index._na_value from 0 to np.nan
Added a dtype parameter to _try_convert_to_int_index to skip the initial attempt to coerce to Int64Index in cases where we really want UInt64Index (fix for BUG: Index constructor doesn't coerce int-like floats to UInt64Index #18400).
Moved test_where and test_where_array_like from TestInt64Index to the NumericInt base class for more generic coverage, and forced it to check that things get coerced to Float64Index. The way it was originally written raised a ValueError due to the 0 -> np.nan change.

jschendel · 2017-11-21T01:18:55Z

pandas/tests/indexes/test_base.py

+
+        # fall back to Float64Index
+        data = [0.0, 1.1, 2.2, 3.3]
+        expected = Float64Index(data)


In light of Case III in #15832, do we actually want this behavior, or should it coerce to Int64Index([0, 1, 2, 3])? Or should it raise as was suggest later on in #15832? Originally wrote this test case to make sure all relevant paths were hit, without seeing #15832.

these should raise as they are passing a dtype that is non convertible

Removed the Float64Index portion of the test. Was able to get it to raise, but ended up breaking other things. Not immediately sure of the fix, but seems outside the scope of this PR, though can look into it more if need be. None of the code I wrote actually depends on that behavior; originally included it to make sure all code paths were tested.

codecov · 2017-11-21T07:46:56Z

Codecov Report

Merging #18401 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18401      +/-   ##
==========================================
- Coverage   91.36%   91.34%   -0.02%     
==========================================
  Files         164      164              
  Lines       49730    49730              
==========================================
- Hits        45435    45426       -9     
- Misses       4295     4304       +9

Flag	Coverage Δ
#multiple	`89.14% <100%> (ø)`	⬆️
#single	`39.62% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/numeric.py	`97.26% <ø> (-0.02%)`	⬇️
pandas/core/indexes/base.py	`96.43% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 509e03c...ea978b0. Read the comment docs.

codecov · 2017-11-21T07:46:58Z

Codecov Report

Merging #18401 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18401      +/-   ##
==========================================
- Coverage   91.35%   91.31%   -0.05%     
==========================================
  Files         163      163              
  Lines       49695    49695              
==========================================
- Hits        45401    45380      -21     
- Misses       4294     4315      +21

Flag	Coverage Δ
#multiple	`89.11% <100%> (-0.03%)`	⬇️
#single	`39.66% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/numeric.py	`97.26% <ø> (-0.02%)`	⬇️
pandas/core/indexes/base.py	`96.43% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.44% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6eac0b...0852ecb. Read the comment docs.

jreback · 2017-11-21T11:10:54Z

pandas/core/indexes/base.py

-                return Int64Index(res, copy=copy, name=name)
-        except (OverflowError, TypeError, ValueError):
-            pass
+        if not is_unsigned_integer_dtype(dtype):


can you add a comment here (eg. about why we don't convert for uint)

jreback · 2017-11-21T11:11:10Z

pandas/tests/indexes/test_numeric.py

+        klasses = [list, tuple, np.array, pd.Series]
+        expected = Float64Index([_nan] + i[1:].tolist())
+
+        for klass in klasses:


you could parametrize

jreback · 2017-11-21T11:12:21Z

doc/source/whatsnew/v0.22.0.txt

@@ -108,7 +108,7 @@ Bug Fixes
 Conversion
 ^^^^^^^^^^

-
+- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)


18398 as well?

Added a separate entry for 18398 under "Backwards incompatible API changes"

jschendel · 2017-11-22T07:28:04Z

Regarding the test_where and test_where_arraylike tests: having both seemed redundant, as there was a bit of overlap, so I combined them into a single test. Some notes:

replaced notna with a list of True, since that's all that notna returned, so calling notna seemed unnecessarily convoluted.
Moved the tests from NumericInt to Numeric since the new version of the test also passed for RangeIndex. Deleted the RangeIndex specific version of the tests.
Went through and made similar changes to other test_where and test_where_arraylike tests that were essentially doing the same thing.

jreback · 2017-11-23T16:26:02Z

can you rebase. ready to go?

jorisvandenbossche · 2017-11-23T16:29:25Z

doc/source/whatsnew/v0.22.0.txt

@@ -36,7 +36,7 @@ Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 - :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
-
+- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN`` (:issue:`18398`)


What user visible change has this?
Now it sounds a bit vague

It impacts things that mask with NA under the hood, such as .where

Previous behavior:

In [3]: idx = pd.UInt64Index(range(5)) In [4]: idx Out[4]: UInt64Index([0, 1, 2, 3, 4], dtype='uint64') In [5]: idx.where(idx > 3) Out[5]: UInt64Index([0, 0, 0, 0, 4], dtype='uint64')

New behavior:

In [3]: idx = pd.UInt64Index(range(5)) In [4]: idx.where(idx > 3) Out[4]: Float64Index([nan, nan, nan, nan, 4.0], dtype='float64')

Updating the whatsnew to make this more clear.

jschendel · 2017-11-23T23:55:43Z

@jreback : this should be ready to merge once green, assuming that the whatsnew change I made to address the comment by @jorisvandenbossche is acceptable.

jreback · 2017-11-24T20:17:18Z

pandas/tests/indexes/test_category.py

@@ -269,28 +269,19 @@ def f(x):
                                  ordered=False)
        tm.assert_index_equal(result, exp)

-    def test_where(self):
+    @pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series])
+    def test_where(self, klass):


we should prob move all of the test_where tests to test_base and use the indices fixture to avoid the code repetition (new issue though)

jreback · 2017-11-24T20:18:15Z

thanks @jschendel very nice!

jschendel commented Nov 21, 2017

View reviewed changes

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 21, 2017

jreback requested changes Nov 22, 2017

View reviewed changes

jschendel force-pushed the uint64-idx-na-value branch from ea978b0 to 619341b Compare November 22, 2017 07:21

jschendel force-pushed the uint64-idx-na-value branch from 619341b to 92c5442 Compare November 22, 2017 08:56

jorisvandenbossche reviewed Nov 23, 2017

View reviewed changes

jschendel force-pushed the uint64-idx-na-value branch from 92c5442 to 48d1f6f Compare November 23, 2017 21:26

jschendel added 4 commits November 23, 2017 16:53

Change UInt64Index._na_value from 0 to np.nan

26649f0

update test

901f209

review edits

22a61a9

clarified whatsnew

0852ecb

jschendel force-pushed the uint64-idx-na-value branch from 48d1f6f to 0852ecb Compare November 23, 2017 23:53

jorisvandenbossche approved these changes Nov 24, 2017

View reviewed changes

jreback approved these changes Nov 24, 2017

View reviewed changes

jreback added this to the 0.22.0 milestone Nov 24, 2017

jreback reviewed Nov 24, 2017

View reviewed changes

jreback merged commit aaee541 into pandas-dev:master Nov 24, 2017

jschendel deleted the uint64-idx-na-value branch December 4, 2017 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change UInt64Index._na_value from 0 to np.nan #18401

Change UInt64Index._na_value from 0 to np.nan #18401

jschendel commented Nov 21, 2017

jschendel Nov 21, 2017 •

edited

Loading

jreback Nov 21, 2017

jschendel Nov 21, 2017

codecov bot commented Nov 21, 2017

codecov bot commented Nov 21, 2017 •

edited

Loading

jreback Nov 21, 2017

jschendel Nov 22, 2017

jreback Nov 21, 2017

jreback Nov 21, 2017

jschendel Nov 22, 2017

jschendel commented Nov 22, 2017

jreback commented Nov 23, 2017

jorisvandenbossche Nov 23, 2017

jschendel Nov 23, 2017

jschendel commented Nov 23, 2017

jreback Nov 24, 2017 •

edited

Loading

jreback commented Nov 24, 2017

Change UInt64Index._na_value from 0 to np.nan #18401

Change UInt64Index._na_value from 0 to np.nan #18401

Conversation

jschendel commented Nov 21, 2017

jschendel Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 21, 2017

Codecov Report

codecov bot commented Nov 21, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel commented Nov 22, 2017

jreback commented Nov 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel commented Nov 23, 2017

jreback Nov 24, 2017 • edited Loading

Choose a reason for hiding this comment

jreback commented Nov 24, 2017

jschendel Nov 21, 2017 •

edited

Loading

codecov bot commented Nov 21, 2017 •

edited

Loading

jreback Nov 24, 2017 •

edited

Loading