Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change UInt64Index._na_value from 0 to np.nan #18401

Merged
merged 4 commits into from
Nov 24, 2017

Conversation

jschendel
Copy link
Member

Prerequisite for #18300

Summary:

  • Changed UInt64Index._na_value from 0 to np.nan

  • Added a dtype parameter to _try_convert_to_int_index to skip the initial attempt to coerce to Int64Index in cases where we really want UInt64Index (fix for BUG: Index constructor doesn't coerce int-like floats to UInt64Index #18400).

  • Moved test_where and test_where_array_like from TestInt64Index to the NumericInt base class for more generic coverage, and forced it to check that things get coerced to Float64Index. The way it was originally written raised a ValueError due to the 0 -> np.nan change.


# fall back to Float64Index
data = [0.0, 1.1, 2.2, 3.3]
expected = Float64Index(data)
Copy link
Member Author

@jschendel jschendel Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of Case III in #15832, do we actually want this behavior, or should it coerce to Int64Index([0, 1, 2, 3])? Or should it raise as was suggest later on in #15832? Originally wrote this test case to make sure all relevant paths were hit, without seeing #15832.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should raise as they are passing a dtype that is non convertible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Float64Index portion of the test. Was able to get it to raise, but ended up breaking other things. Not immediately sure of the fix, but seems outside the scope of this PR, though can look into it more if need be. None of the code I wrote actually depends on that behavior; originally included it to make sure all code paths were tested.

@codecov
Copy link

codecov bot commented Nov 21, 2017

Codecov Report

Merging #18401 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18401      +/-   ##
==========================================
- Coverage   91.36%   91.34%   -0.02%     
==========================================
  Files         164      164              
  Lines       49730    49730              
==========================================
- Hits        45435    45426       -9     
- Misses       4295     4304       +9
Flag Coverage Δ
#multiple 89.14% <100%> (ø) ⬆️
#single 39.62% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/numeric.py 97.26% <ø> (-0.02%) ⬇️
pandas/core/indexes/base.py 96.43% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 509e03c...ea978b0. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 21, 2017

Codecov Report

Merging #18401 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18401      +/-   ##
==========================================
- Coverage   91.35%   91.31%   -0.05%     
==========================================
  Files         163      163              
  Lines       49695    49695              
==========================================
- Hits        45401    45380      -21     
- Misses       4294     4315      +21
Flag Coverage Δ
#multiple 89.11% <100%> (-0.03%) ⬇️
#single 39.66% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/numeric.py 97.26% <ø> (-0.02%) ⬇️
pandas/core/indexes/base.py 96.43% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6eac0b...0852ecb. Read the comment docs.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 21, 2017
return Int64Index(res, copy=copy, name=name)
except (OverflowError, TypeError, ValueError):
pass
if not is_unsigned_integer_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here (eg. about why we don't convert for uint)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

klasses = [list, tuple, np.array, pd.Series]
expected = Float64Index([_nan] + i[1:].tolist())

for klass in klasses:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could parametrize

@@ -108,7 +108,7 @@ Bug Fixes
Conversion
^^^^^^^^^^

-
- Bug in :class:`Index` constructor with `dtype='uint64'` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

18398 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate entry for 18398 under "Backwards incompatible API changes"

@jschendel
Copy link
Member Author

Regarding the test_where and test_where_arraylike tests: having both seemed redundant, as there was a bit of overlap, so I combined them into a single test. Some notes:

  • replaced notna with a list of True, since that's all that notna returned, so calling notna seemed unnecessarily convoluted.
  • Moved the tests from NumericInt to Numeric since the new version of the test also passed for RangeIndex. Deleted the RangeIndex specific version of the tests.
  • Went through and made similar changes to other test_where and test_where_arraylike tests that were essentially doing the same thing.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2017

can you rebase. ready to go?

@@ -36,7 +36,7 @@ Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
-
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN`` (:issue:`18398`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What user visible change has this?
Now it sounds a bit vague

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It impacts things that mask with NA under the hood, such as .where

Previous behavior:

In [3]: idx = pd.UInt64Index(range(5))

In [4]: idx
Out[4]: UInt64Index([0, 1, 2, 3, 4], dtype='uint64')

In [5]: idx.where(idx > 3)
Out[5]: UInt64Index([0, 0, 0, 0, 4], dtype='uint64')

New behavior:

In [3]: idx = pd.UInt64Index(range(5))

In [4]: idx.where(idx > 3)
Out[4]: Float64Index([nan, nan, nan, nan, 4.0], dtype='float64')

Updating the whatsnew to make this more clear.

@jschendel
Copy link
Member Author

@jreback : this should be ready to merge once green, assuming that the whatsnew change I made to address the comment by @jorisvandenbossche is acceptable.

@jreback jreback added this to the 0.22.0 milestone Nov 24, 2017
@@ -269,28 +269,19 @@ def f(x):
ordered=False)
tm.assert_index_equal(result, exp)

def test_where(self):
@pytest.mark.parametrize('klass', [list, tuple, np.array, pd.Series])
def test_where(self, klass):
Copy link
Contributor

@jreback jreback Nov 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prob move all of the test_where tests to test_base and use the indices fixture to avoid the code repetition (new issue though)

@jreback jreback merged commit aaee541 into pandas-dev:master Nov 24, 2017
@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

thanks @jschendel very nice!

@jschendel jschendel deleted the uint64-idx-na-value branch December 4, 2017 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
3 participants