-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to i8data for DatetimeIndex #24559
Comments
@jbrockmendel could you fill in the "Reason for the change" section? IIUC, it was to simplify |
Possibly related, I worked on this topic in #21216. Rational was that I have an open issue #20257 to document passing integers into |
@mroeschke thanks. IIRC consistency with to_datetime was also a consideration |
|
The Regarding"was this documented": it might not have been documented clearly, but it is long standing behaviour: either we think people don't use it / shouldn't use it (but then why bother changing it? Shouldn't we then rather deprecate the whole ability of passing integer data, instead of changing the behaviour?), or either people do use it, and this change will break that usage. |
Any other thoughts / replies here? |
I think this surprised me early on. That may have been because I was used to the old way; I'm not really sure.
My intent there was "If this was documented before, then we definitely can't change it." I wasn't advocating "It wasn't documented, so we can change it.". Just "It wasn't documented, so we can maybe change it" :) I don't really have an opinion on the technical merits of wall time vs. epochs. I don't think I know enough to vote one way or the other. |
I think the overriding internal-consistency concern is the one @mroeschke reminded us of: this should behave like Timestamp constructor. @TomAugspurger would "fixing" this behavior be difficult? Last time I tried something similar I got test_packers failures that I couldn't figure out. |
Demo of the inconsistency on 0.23.4 In [9]: i8 = pd.Timestamp('2000', tz='CET').value
In [10]: pd.Timestamp(i8)
Out[10]: Timestamp('1999-12-31 23:00:00')
In [11]: pd.Timestamp(i8, tz="CET")
Out[11]: Timestamp('2000-01-01 00:00:00+0100', tz='CET')
In [12]: pd.DatetimeIndex(np.array([i8]))
Out[12]: DatetimeIndex(['1999-12-31 23:00:00'], dtype='datetime64[ns]', freq=None)
In [13]: pd.DatetimeIndex(np.array([i8]), tz="CET")
Out[13]: DatetimeIndex(['1999-12-31 23:00:00+01:00'], dtype='datetime64[ns, CET]', freq=None) On master, Out[13] matches Out[11]. That consistency is certainly worth striving for. If we wanted a graceful deprecation warning, then the DTI constructor would
Putting that warning in place would give us an idea of how difficult this would be to change. I'll see what turns up. Since |
The other option could also be to deprecate passing integers, if we find it too confusing what it should result in. |
Since For 2, if a user wants epoch timestamps (future behavior) they should use |
#24623 is now passing. PTAL. |
@jbrockmendel it is not fully clear to me what #24623 exactly does related to the discussion above. Can you summarize it here? |
There are two central changes in #24623, both of which were the status quo before #24024.
|
Then how is that related to this issue? This issue is about the change in behaviour of handling integer values in the DatetimeIndex constructor, not about the DatetimeArray constructor |
If it isn't closely enough related for your tastes then feel free to ignore. |
You wrote the PR, I think you are better placed to judge to what extent it is related to the discussion here ;) |
I'm actually currently getting side-tracked by trying to figure out exactly what is and isn't internally-consistent. It looks like 24623 makes i8 behavior consistent between DTA/DTI/Timestamp and M8[ns] behavior consistent between DTA/DTI, but breaks the existing consistency in master between DTA/Timestamp. I'll try to put together a grid for these... |
And an answer to the comment of @mroeschke
That sounds as a logical rule to me. |
I'm not sure I understand this rule. Is the idea that the constructor would only be able to create tz-aware {n \choose k}(DTA/DTI/Timestamp) with UTC and them make the user do tz_convert as another step? |
In an ideal world I would prefer that since it's more explicit, but I think it would require deprecating I would be fine just propagating the current |
I've made things more difficult here by voicing opinions while being misinformed. Apologies all around. Recap/Double-Checking behavior from 0.23.4 and master. In all cases passing tz/dtype corresponding to US/Pacific.
master
0.23.4
In both 0.23.4 and master, In both 0.23.4 and master, both In 0.23.4 In master, In master, I think there is a consensus that the preferred behavior is for all of these to match Timestamp, right? So the options for DTI are:
And the options for DTA: The question I'd pose to the more senior folks: can we get away with option 3? |
I'm having some trouble following the samples in
is correct. Can you clarify the difference between your option 1 and 3? Is the difference the handling of i8 values in DTI._simple_new? |
None of the options I have in mind touch In option 1, In option 3, |
And in case it might be useful, the experiment I did earlier today: https://github.com/pandas-dev/pandas/compare/master...jorisvandenbossche:i8revert?expand=1 (only changed the DatetimeIndex constructor without fixing any of the consequential failures) |
Back to the deprecation warning. What's the expected behavior of |
Assuming we eventually want the behavior for DTI that's currently on master, how's this for a deprecation warning?
|
One more question: should
will always be the same. Gotta take a break from this until this afternoon. WIP at https://github.com/TomAugspurger/pandas/tree/revert-i8 That's based off @jorisvandenbossche's start. Should be easy to swap in @jbrockmendel's changes to |
I think I would raise on |
Nice deprecation warning! (I would maybe only mention something regarding the reason for the change? integers seen as unix epoch which is defined to be UTC? (although its nanoseconds so not exactly unix epochs))
Since it stays the same, ideally no, I would say
Ideally the new behaviour IMO (although that might make the implementation more complicated?) |
Hmm, yes, that might be a good argument (although I think, as a user, I would have the reflex to complain about it: "I already specify UTC!") |
Fair enough. Yeah from a user perspective, it may feel like a false positive warning. My opinion about warning with |
FWIW, I'm ambivalent about warning in the UTC case. Either way has downsides. |
If long term we want option 5, do we need to get the deprecation for Timestamp behavior in for 0.24.0? |
Possibly, but not necessarily for the RC (which I'd like to do immediately
following #24708)
…On Thu, Jan 10, 2019 at 1:49 PM jbrockmendel ***@***.***> wrote:
If long term we want option 5, do we need to get the deprecation for
Timestamp behavior in for 0.24.0?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24559 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInNJxv82m9bSM5-hzj9Z7SeCDCpSks5vB5lVgaJpZM4Zm9Ru>
.
|
anything more on this for 0.24.0? |
I think we're good for 0.24.0. We do need to settle on a desired long-term behavior though. |
How close are we to consensus on this? It looks like @mroeschke, @jreback, and I have expressed preference for options 4 or 5, with a slight preference towards 5. @jorisvandenbossche has expressed a preference for option 6. Anyone else want to weigh in? |
@TomAugspurger @jorisvandenbossche @mroeschke @jreback I'd like to settle on a desired long-term behavior and start the appropriate deprecation warnings before 0.25.0. How far are we from consensus? |
@jbrockmendel can u do. short summary of this and just confirm that 0.24.2 |
We discussed this on the call. IIRC someone volunteered / was volunteered to summarize, and maybe close this issue? (was I the one to volunteer?) |
IIUC, this can be closed since the changes have been made and the deprecation enforced. |
Master currently has an (undocumented) (maybe-) API-breaking change from 0.23.4 when passed integer values
0.23.4
Master
Attempt to explain the behavior: In 0.23.4, passing an
ndarray[i8]
was equivalent to passingdata.view("M8[ns]")
On master, integer values are treated as unix timestamps, while M8[ns] values are treated as wall-times in the given timezone.
Reason for the change
There are four cases of interest:
In 0.23.4 we have
a.equals(b)
andc.equals(d)
but no way to pass data in a way that was constructor-neutral. In master we now havea
matchc
andd
. At some point in the refactoring process we changed that, but off the top of my head I don't remember when or if this was the precise motivation or just a side-benefit.BTW _simple_new was also way too much:
Was this documented?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DatetimeIndex.html mentions that it's "represented internally as int64".
The (imprecise) type on
data
is "Optional datetime-like data"I don't see anything in http://pandas.pydata.org/pandas-docs/stable/timeseries.html suggesting that integers can be passed to DatetimeIndex.
The text was updated successfully, but these errors were encountered: