-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/REF: TimedeltaIndex.__new__ #23539
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
@jbrockmendel It seems you are adding net a lot of code. Is that because some of the things were previously handled by the But that said, I think some of those things (eg the float handling) should actually be fixed in |
pandas/core/indexes/timedeltas.py
Outdated
|
||
# Convert whatever we have into timedelta64[ns] dtype | ||
if is_object_dtype(data) or is_string_dtype(data): | ||
# no need to make a copy, need to convert if string-dtyped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would u check is_string_dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we have several tests that specifically pass e.g. np.array(['1 days', '4 days'], dtype='<S6')
. (BTW we don't have similar tests for DatetimeIndex, which I intend to fix in the DTI analogue of this PR)
@jbrockmendel as also mentioned in the other PR, Tom and I assumed that the consensus was to not do this (keep the TDA constructor simple), so if you still want to do this, please respond to the issue I linked then: #23212 |
What's the issue with passing a datetime64[ns] array to TDI? NumPy is happy with it In [50]: np.asarray(dr).astype("timedelta64[ns]")
Out[50]:
array([1452816000000000000, 1452902400000000000, 1452988800000000000,
1453075200000000000, 1453161600000000000, 1453248000000000000],
dtype='timedelta64[ns]') and we match that In [51]: np.asarray(dr).astype("timedelta64[ns]") == np.asarray(pd.TimedeltaIndex(np.asarray(dr)))
Out[51]: array([ True, True, True, True, True, True]) |
@TomAugspurger you can do that but IMHO this is a TypeError |
I would personally be fine with raising a TypeError, since I think this can be quite surprising behaviour (it's just using the integer values). But also fine to not deviate from numpy behaviour. But given it is numpy behaviour, so if we want to change it, I think we should deprecate it first. |
"is a" -> "should be"? I don't really disagree, and we do raise for |
ok by me to deprecate first |
I do realize my argument of "but NumPy allows it" isn't great :) In [3]: np.array([1, np.nan]).astype(int)
Out[3]: array([ 1, -9223372036854775808]) setting aside the TDI(ndarray[timedelta64]) stuff. I think the important question is whether we want
@jbrockmendel By "constructor" do you mean |
It can be difficult to tell exactly what is handled within the to_timedelta calls because there are two of them reached under separate conditions, and most of the relevant logic is then done again below that anyway. I'd be +1 on improving to_timedelta later.
I'll restate my opinion there when I'm ready, thank you. I avoided changing the Array constructors specifically to avoid this being an issue; really shouldn't have mentioned it all in the OP. |
@TomAugspurger same as above; that is an important question, but I went out of my way to keep it totally orthogonal to this PR. |
It sounds like the discussion w/r/t |
I think so.
Ok. I think I saw "refactor |
The bug you are mentioning (
Yes, I think so. |
AFAICT the to_timedelta call (L168 in master) is almost superfluous given what comes below (which was recently moved there IIRC because it was bizarrely located in That said, an argument that I would find compelling is that we definitely don't want the behavior of |
If you are going to look at it, I would recommend trying to make the checks you mention here redundant, by letting Dived into the (current master) code for a moment, and you have this: pandas/pandas/core/indexes/timedeltas.py Lines 165 to 173 in 28a42da
Here, the conversion of everything that is not yet a timedelta64 array is first handled with |
@jorisvandenbossche I think we're in agreement on the relevant part which is that the status quo is redundant and unclear. Give me some time to adjust the PR to see if we can reach something closer to optimal. |
… fastpath for inference/validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments.
sn = pd.to_timedelta(Series([pd.NaT])) | ||
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): | ||
# Passing datetime64-dtype data to TimedeltaIndex is deprecated | ||
sn = pd.to_timedelta(Series([pd.NaT])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i mean is that this is all nan and inference is ambiguous at this point. so I wouldn't expect the warning (unless the dtype IS explicity passed). Can address in a followup, but pls open an issue, this is IMHO a bug.
@gfyoung if you have any further comments. |
@jbrockmendel ok let's merge this on green. a couple of minor followups (they are unresolved conversations). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some suggestions and info about the docstrings format.
thanks @jbrockmendel if you'd apply @datapythonista comments next pass would be great. trying to not block your followups. |
* upstream/master: BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539)
@datapythonista I'm adding suggested edits into #23587 (needs rebasing anyway). Definitely good suggestions. For the "unit" description the best I've come up with is "The timedelta unit to treat integers as multiples of"; I can't think of a natural-sounding way to rephrase this that doesn't end in a preposition. Thoughts? |
That description sounds good to me. Can't think of anything better right now. |
…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)
box2 = pd.Series if box is pd.Index else box | ||
expected = tm.box_expected(expected, box2) | ||
|
||
result = idx * Series(rng5f + 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel why did you change this? Left-over from initially changing the behaviour on floats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
TimedeltaIndex.__new__
type-checking is tough to decipher, and hides multiple bugsIn addition to this incorrect behavior being supported, it is actually tested in
tests.indexes.timedeltas.test_ops
in what I have to assume is a copy/paste mixup of some kind.This PR fixes the constructor to
np.array([19.0])
is OK,np.array([19.5])
is not.np.nan
is correctly converted toNaT
Other assorted small changes:
Move an incorrectly placed
# ----
section divider in scalar timedelta tests.Change a repeated-4-times 4-liner (setting
result.freq = inferred_freq
) to a 2-linearMove raising and returning-early cases to the top of
DatetimeIndex.__new__
andTimedeltaIndex.__new__
Has tests, does not have whatsnew note, pending suggestions as to which parts merit notes.
Does not significantly change the TimedeltaArray constructor; in an upcoming step we can move much of the logic there and have the TDI constructor wrap a/the TDA constructor.