-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.drop unexpectedly drops frequency #58846
BUG: DataFrame.drop unexpectedly drops frequency #58846
Conversation
pandas/core/indexes/base.py
Outdated
from pandas import DatetimeIndex | ||
|
||
if isinstance(self, DatetimeIndex): | ||
new_index.freq = self.freq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m pretty sure this will not be correct in the general case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
Thank you so much for your quick review. Sorry that I totally misunderstood the case....
What about the code below inferring freq from the newly created DatetimeIndex. (cf. #22561)
if isinstance(self, DatetimeIndex):
new_index.freq = to_offset(new_index.inferred_freq)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code above passed unit tests except for test_delete_slice2
in test_delete.py
test_delete_slice2
expects df.drop returns freq=None. But after this fix, expected values are like <2 * Days>, <2 * Years>, and so on. Should I revise test_delete_slice2
to use _with_freq('infer')
instead?
# reset freq to None
result = ts.drop(ts.index[[1, 3, 5, 7, 9]]).index
expected = dti[::2]._with_freq(None) # should be 'infer' instead
tm.assert_index_equal(result, expected)
assert result.name == expected.name
assert result.freq == expected.freq
assert result.tz == expected.tz
pandas/core/indexes/base.py
Outdated
return self.delete(indexer) | ||
new_index = self.delete(indexer) | ||
|
||
# check if we need to set the freq attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this here, override DatetimeIndex.drop
result = ts.drop(ts.index[[1, 3, 5, 7, 9]]).index | ||
expected = dti[::2]._with_freq(None) | ||
expected = dti[::2]._with_freq("infer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "infer" here should be unnecessary if dti already has a freq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed _with_freq("infer")
|
||
@pytest.mark.parametrize("freq", ["YE", "ME", "D"]) | ||
def test_drop_method_freq_preservation(self, freq): | ||
start = "1970-01-01" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a "# GH#???" reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added # GH 58846
pandas/core/indexes/base.py
Outdated
# check if we need to set the freq attribute | ||
from pandas import DatetimeIndex | ||
|
||
if isinstance(self, DatetimeIndex) and isinstance(new_index, DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only do this if self.freq is not None
pandas/core/indexes/base.py
Outdated
from pandas import DatetimeIndex | ||
|
||
if isinstance(self, DatetimeIndex) and isinstance(new_index, DatetimeIndex): | ||
new_index.freq = to_offset(new_index.inferred_freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferred_freq can be expensive. might be cheaper to invert indexer
and call take
, which has another PR to preserve freq which we can optimize
im not convinced this is something we should do. this seems like a case where users shouldn't be using |
I'm not convinced, either. In any case, I don't think it's desirable for Thank you so much again for your review. |
Yeah I'm not convinced this complexity is worth adding. Thanks for the PR but going to close for now |
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.