Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index #23524

Merged
merged 17 commits into from
Nov 11, 2018

Conversation

jbrockmendel
Copy link
Member

Also fixes bug with DateOffset == "infer" incorrectly raising instead of returning False.

Also fixes bug(?) with pd.Index(dtindex, dtype=object) returning an index of datetimes instead of Timestamps, potentially losing nanoseconds.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

pandas/core/indexes/base.py Outdated Show resolved Hide resolved
@jreback jreback added Datetime Datetime data dtype Timezones Timezone data dtype labels Nov 6, 2018
@codecov
Copy link

codecov bot commented Nov 6, 2018

Codecov Report

Merging #23524 into master will decrease coverage by <.01%.
The diff coverage is 91.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23524      +/-   ##
==========================================
- Coverage   92.25%   92.25%   -0.01%     
==========================================
  Files         161      161              
  Lines       51262    51278      +16     
==========================================
+ Hits        47292    47304      +12     
- Misses       3970     3974       +4
Flag Coverage Δ
#multiple 90.63% <91.3%> (-0.01%) ⬇️
#single 42.32% <43.47%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimes.py 98.44% <100%> (+0.02%) ⬆️
pandas/core/indexes/base.py 96.45% <100%> (ø) ⬆️
pandas/tseries/offsets.py 97.07% <85.71%> (-0.15%) ⬇️
pandas/io/pytables.py 92.34% <0%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efd1844...cc7f5cd. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Also fixes bug(?) with pd.Index(dtindex, dtype=object) returning an index of datetimes instead of Timestamps, potentially losing nanoseconds.

Seems like a fine change, but needs a release note.

@jbrockmendel
Copy link
Member Author

Seems like a fine change, but needs a release note.

Good point; done.

doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
pandas/_libs/tslibs/offsets.pyx Show resolved Hide resolved
pandas/core/arrays/datetimes.py Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/tseries/offsets.py Show resolved Hide resolved
doc/source/whatsnew/v0.24.0.txt Outdated Show resolved Hide resolved
pandas/core/arrays/datetimes.py Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 0.24.0 milestone Nov 8, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment


# TODO: warn that dtype is not used?
# warn that conversion may be lossy?
return self._data.view(np.ndarray) # follow Index.__array__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other question: what is the .view(np.ndarray) part doing if it is already an array? Can we remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can probably be removed; this is taken directly from the Index.__array__ implementation, so I think the maybe-removing this should be done at the same time those methods are overhauled (ill be opening an Issue shortly)

@@ -57,6 +57,54 @@ def timedelta_index(request):

class TestDatetimeArray(object):

def test_array_object_dtype(self, tz_naive_fixture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think it would be good to add such a test to the base extension tests as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know those tests well enough to have an informed opinion. AFAIK ExtensionArray doesn't implement __array__, so it isn't clear that this is supported in the general case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EA implements __iter__, which should be sufficient.

This test would be slightly opinionated for a base test, in case an EA wants to be converted to a specific NumPy type, but I think it's OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have base.interface.BaseInterfaceTests.test_array_interface which checks

    def test_array_interface(self, data):
        result = np.array(data)
        assert result[0] == data[0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's already a generic test. OK, since that does not actually test the return dtype, it's good to have more explicit tests here.

Should we expect from EA that np.array(EA, dtype=object) always works (returns an object array of scalars)?
That seems like an OK assumption to me, since this already happens if you don't implement __array__, so we can expect this as well if the EA author implements a custom __array__ I think.

pandas/core/arrays/datetimes.py Outdated Show resolved Hide resolved
if is_object_dtype(dtype):
return np.array(list(self), dtype=object)
elif is_int64_dtype(dtype):
return self.asi8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this elif branch. Numpy will afterwards convert the M8[ns] data to int, and in that way ensure the semantics of np.asarray regarding copy/no copy is followed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I opened #23593, a PR doing the __array__ for all datetimelike EAs, not only DatetimeArray (but so there is a bit of overlap with this PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'll take a look at 23593.

@jreback jreback merged commit 58a59bd into pandas-dev:master Nov 11, 2018
@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

thanks @jbrockmendel

#23593 is the followup to address __array__.

@jbrockmendel jbrockmendel deleted the index_bugs branch November 11, 2018 17:08
thoo added a commit to thoo/pandas that referenced this pull request Nov 11, 2018
* upstream/master:
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
  DOC: Fixes to docstring to add validation to CI (pandas-dev#23560)
  DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600)
  MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592)
  DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
thoo added a commit to thoo/pandas that referenced this pull request Nov 12, 2018
…fixed

* upstream/master:
  DOC: Enhancing pivot / reshape docs (pandas-dev#21038)
  TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)
  BUILD: Simplifying contributor dependencies (pandas-dev#23522)
  BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539)
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DatetimeIndex cast to object dtype raises/wrong for tzaware
5 participants