-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index #23524
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23524 +/- ##
==========================================
- Coverage 92.25% 92.25% -0.01%
==========================================
Files 161 161
Lines 51262 51278 +16
==========================================
+ Hits 47292 47304 +12
- Misses 3970 3974 +4
Continue to review full report at Codecov.
|
Seems like a fine change, but needs a release note. |
Good point; done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment
|
||
# TODO: warn that dtype is not used? | ||
# warn that conversion may be lossy? | ||
return self._data.view(np.ndarray) # follow Index.__array__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other question: what is the .view(np.ndarray)
part doing if it is already an array? Can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can probably be removed; this is taken directly from the Index.__array__
implementation, so I think the maybe-removing this should be done at the same time those methods are overhauled (ill be opening an Issue shortly)
@@ -57,6 +57,54 @@ def timedelta_index(request): | |||
|
|||
class TestDatetimeArray(object): | |||
|
|||
def test_array_object_dtype(self, tz_naive_fixture): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think it would be good to add such a test to the base extension tests as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know those tests well enough to have an informed opinion. AFAIK ExtensionArray doesn't implement __array__
, so it isn't clear that this is supported in the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EA implements __iter__
, which should be sufficient.
This test would be slightly opinionated for a base test, in case an EA wants to be converted to a specific NumPy type, but I think it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we have base.interface.BaseInterfaceTests.test_array_interface
which checks
def test_array_interface(self, data):
result = np.array(data)
assert result[0] == data[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's already a generic test. OK, since that does not actually test the return dtype, it's good to have more explicit tests here.
Should we expect from EA that np.array(EA, dtype=object)
always works (returns an object array of scalars)?
That seems like an OK assumption to me, since this already happens if you don't implement __array__
, so we can expect this as well if the EA author implements a custom __array__
I think.
if is_object_dtype(dtype): | ||
return np.array(list(self), dtype=object) | ||
elif is_int64_dtype(dtype): | ||
return self.asi8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this elif branch. Numpy will afterwards convert the M8[ns] data to int, and in that way ensure the semantics of np.asarray
regarding copy/no copy is followed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I opened #23593, a PR doing the __array__
for all datetimelike EAs, not only DatetimeArray (but so there is a bit of overlap with this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I'll take a look at 23593.
thanks @jbrockmendel #23593 is the followup to address |
* upstream/master: BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619) DOC: Fixes to docstring to add validation to CI (pandas-dev#23560) DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600) MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592) DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)
Also fixes bug with
DateOffset == "infer"
incorrectly raising instead of returningFalse
.Also fixes bug(?) with
pd.Index(dtindex, dtype=object)
returning an index of datetimes instead of Timestamps, potentially losing nanoseconds.git diff upstream/master -u -- "*.py" | flake8 --diff