Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timestamp == date match stdlib #36131

Merged
merged 64 commits into from
Jan 1, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Sep 4, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

ATM we have one reasonable behavior Timestamp("2020-09-04") == date(2020, 9, 4) and two un-reasonable behaviors: ``Timestamp("2020-09-04").tz_localize("US/Pacific") == date(2020, 9, 4), Timestamp.now() == Timestamp.now().date()`. Since the stdlib datetime doesnt consider `datetime(2020, 9, 4) == date(2020, 9, 4)`, this follows the stdlib and considers them never equal.

I'm still getting one test failure locally.

cc @mroeschke

@mroeschke
Copy link
Member

mroeschke commented Sep 6, 2020

Seems reasonable. In general do you think we should reel back datetime.date support in ops with datetime64 dtypes?

@jbrockmendel
Copy link
Member Author

In general do you think we should reel back datetime.date support in ops with datetime64 dtypes?

Yes. I think indexing should behave like comparisons, in which case date objects would not be allowed. If we were to allow date objects in indexing, I think it would make more sense to treat date(2020, 9, 5) like "2020-09-05" than like datetime(2020, 9, 5), i.e. like a partial-slice. But i think thats more hassle than its worth.

@jorisvandenbossche
Copy link
Member

Shouldn't we rather see this as an API change than a bug fix?
I would say yes. I can imagine people relying on this to filter to all timestamps on a given date.

@jbrockmendel
Copy link
Member Author

I can imagine people relying on this to filter to all timestamps on a given date.

That's conceivable, but I think its much more likely that people are assuming that it casts to datetime-at-midnight and so are silently getting incorrect results.

@jorisvandenbossche
Copy link
Member

its much more likely that people are assuming that it casts to datetime-at-midnight and so are silently getting incorrect results.

Well, and they will then also start to see a warning to make them aware of this.


Now, it's not only about the "timestamp getting normalized to midnight when compared to date" behaviour, because as I understand it with this PR also Timestamps as midnight will no longer compare equal? I think we should first deprecate that.

@jbrockmendel
Copy link
Member Author

Updated to deprecate.

@@ -1065,7 +1065,9 @@ def test_apply_with_date_in_multiindex_does_not_convert_to_timestamp():
)

grp = df.groupby(["A", "B"])
result = grp.apply(lambda x: x.head(1))
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are exposing this to the user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this happens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it tries to make a MultiIndex out of the A and B columns, it goes through Categorical, which passes convert_dates=True to maybe_infer_to_datetimelike,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be warnign at the top level

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could finagle it so that we dont pass convert_dates=True to maybe_infer_to_datetimelike in this case, but that gets pretty complicated, especially compared to just warning the user that date objects are finicky

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a while back you were in favor of telling people not to pass dates to the MultiIndex constructors.

@jreback jreback added Deprecate Functionality to remove in pandas Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 13, 2020
@jreback
Copy link
Contributor

jreback commented Dec 30, 2020

btw do we have an actual issue for this? (that this PR would close)

@jorisvandenbossche
Copy link
Member

There are still some warnings raised by the tests. For example in the github actions build I see:

pandas/tests/indexes/datetimes/test_indexing.py::TestGetIndexer::test_get_indexer_out_of_bounds_date[target0-positions0]
pandas/tests/indexes/datetimes/test_indexing.py::TestGetIndexer::test_get_indexer_out_of_bounds_date[target1-positions1]
pandas/tests/indexes/datetimes/test_indexing.py::TestGetIndexer::test_get_indexer_out_of_bounds_date[target2-positions2]
  /home/runner/work/pandas/pandas/pandas/core/indexes/base.py:3297: FutureWarning: Comparison of Timestamp with datetime.date is deprecated in order to match the standard library behavior.  In a future version these will be considered non-comparable.Use ts == pd.Timestamp(dt) instead.
    indexer = self._engine.get_indexer(target._get_engine_target())

@jreback
Copy link
Contributor

jreback commented Dec 30, 2020

cc @jorisvandenbossche

@jbrockmendel
Copy link
Member Author

There are still some warnings raised by the tests. For example in the github actions build I see:

Huh, why isnt that failing the npdev build?

jbrockmendel and others added 3 commits December 30, 2020 14:40
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jbrockmendel
Copy link
Member Author

Updated test_get_indexer_out_of_bounds_date to catch the warning. Weird thing is that locally every-other-run it fails saying it the expected warning was not emitted.

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

@jbrockmendel
Copy link
Member Author

now builds are failing because of warnings

harumph, yah that looks like the same thing where that was happening every-other-run locally

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

@jbrockmendel
Copy link
Member Author

suppressed the warning, green

@jorisvandenbossche
Copy link
Member

Why can the warning be suppressed in the tests? It's not a valid user case?

@jbrockmendel
Copy link
Member Author

Why can the warning be suppressed in the tests? It's not a valid user case?

its flaky and i haven't figured out why

@jreback jreback merged commit 6fd5cc3 into pandas-dev:master Jan 1, 2021
@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants