Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect DTI/TDI indexing; warn before dropping tzinfo #22549

Merged
merged 9 commits into from
Sep 8, 2018

Conversation

jbrockmendel
Copy link
Member

Slight hodge-podge.

#21333 is about to_period method on DatetimeIndex silently dropping timezone info. This PR makes DatetimeIndex.to_period and Timestamp.to_period issue a warning when losing timezone info.

#20464 is entirely unrelated, notes that pd.date_range('1970-01-01', period=2).get_loc(pd.Timedelta(0)) returns 0 instead of raising; ditto for indexing a TDI with a Timestamp.

@codecov
Copy link

codecov bot commented Aug 31, 2018

Codecov Report

Merging #22549 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22549      +/-   ##
==========================================
+ Coverage   92.05%   92.05%   +<.01%     
==========================================
  Files         169      169              
  Lines       50783    50788       +5     
==========================================
+ Hits        46749    46754       +5     
  Misses       4034     4034
Flag Coverage Δ
#multiple 90.46% <100%> (ø) ⬆️
#single 42.3% <37.5%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.68% <100%> (+0.02%) ⬆️
pandas/core/indexes/timedeltas.py 91.19% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70c9003...acf7b6e. Read the comment docs.

@gfyoung gfyoung added Datetime Datetime data dtype Period Period data type Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 1, 2018
if self.tz is not None:
# GH#21333
warnings.warn("Converting to Period representation will "
"drop timezone information.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning type and stack level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This questions applies to everywhere where you placed warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what warning type to use. Suggestions?

As to stack level, I tried a bunch to get that to work with tm.assert_produces_warning and eventually threw in the towel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you have it, UserWarning makes sense, but I think being explicit about it is good.

I know what you mean regarding stacklevel. We generally try to get one that makes sense, and if the tests don't cooperate, we can just use check_stacklevel=False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the warnings are raised in direct user-api methods, normally putting a stacklevel=2 should do the correct thing.
What did you not get working in the tests? (which kind of code sample)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you not get working in the tests?

IIRC the problem is that the affected code paths issues both the new FutureWarning and in some cases also a PerformanceWarning. tm.assert_produces_warning doesn't support multiple expected warnings, and my attempt to modify it led to tests failing the stacklevel checks.

@@ -1201,6 +1205,12 @@ def get_loc(self, key, method=None, tolerance=None):
key = Timestamp(key, tz=self.tz)
return Index.get_loc(self, key, method, tolerance)

if isinstance(key, timedelta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if/elif here?

@@ -487,7 +488,8 @@ def get_loc(self, key, method=None, tolerance=None):
-------
loc : int
"""
if is_list_like(key):
if is_list_like(key) or (isinstance(key, datetime) and key is not NaT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we use isna else for NaT checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this context isna is less clear. Since there is a specific na-like object we are catching here, we should be explicit about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this comment looks good

@@ -487,7 +488,8 @@ def get_loc(self, key, method=None, tolerance=None):
-------
loc : int
"""
if is_list_like(key):
if is_list_like(key) or (isinstance(key, datetime) and key is not NaT):
# GH#20464 for datetime case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this more explicit, meaning datetime dtype for all-NaT

with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
result = ts.to_period()[0]
expected = ts[0].to_period()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future PR these should be parameterized

@@ -41,6 +41,14 @@ def test_getitem(self):
tm.assert_index_equal(result, expected)
assert result.freq == expected.freq

def test_timestamp_invalid_key(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over a datetime as well

@@ -41,6 +41,14 @@ def test_getitem(self):
tm.assert_index_equal(result, expected)
assert result.freq == expected.freq

def test_timestamp_invalid_key(self):
# GH#20464
tdi = pd.timedelta_range(0, periods=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test that indexes with NaT? (both Timedelta and Datetime dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one for timedelta, just added one for datetime

@jreback jreback added this to the 0.24.0 milestone Sep 4, 2018
@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for updating the PR.

@jbrockmendel
Copy link
Member Author

Looks like circleCI is a hypothesis timeout

@jreback
Copy link
Contributor

jreback commented Sep 7, 2018

lgtm. rebase after @mroeschke CalendarDay change. ping on green.

@jbrockmendel
Copy link
Member Author

Ping

@jreback jreback merged commit 935c5d1 into pandas-dev:master Sep 8, 2018
@jreback
Copy link
Contributor

jreback commented Sep 8, 2018

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
5 participants