Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.drop unexpectedly drops frequency #58846

Conversation

mcmrc
Copy link

@mcmrc mcmrc commented May 27, 2024

@mcmrc mcmrc changed the title BUG: DataFrame.drop unexpectedly drop frequency BUG: DataFrame.drop unexpectedly drops frequency May 27, 2024
from pandas import DatetimeIndex

if isinstance(self, DatetimeIndex):
new_index.freq = self.freq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m pretty sure this will not be correct in the general case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
Thank you so much for your quick review. Sorry that I totally misunderstood the case....
What about the code below inferring freq from the newly created DatetimeIndex. (cf. #22561)

        if isinstance(self, DatetimeIndex):
            new_index.freq = to_offset(new_index.inferred_freq)

Copy link
Author

@mcmrc mcmrc May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above passed unit tests except for test_delete_slice2 in test_delete.py

test_delete_slice2 expects df.drop returns freq=None. But after this fix, expected values are like <2 * Days>, <2 * Years>, and so on. Should I revise test_delete_slice2 to use _with_freq('infer') instead?

        # reset freq to None
        result = ts.drop(ts.index[[1, 3, 5, 7, 9]]).index
        expected = dti[::2]._with_freq(None) # should be 'infer' instead
        tm.assert_index_equal(result, expected)
        assert result.name == expected.name
        assert result.freq == expected.freq
        assert result.tz == expected.tz

@mcmrc mcmrc requested a review from jbrockmendel May 30, 2024 11:38
return self.delete(indexer)
new_index = self.delete(indexer)

# check if we need to set the freq attribute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this here, override DatetimeIndex.drop

result = ts.drop(ts.index[[1, 3, 5, 7, 9]]).index
expected = dti[::2]._with_freq(None)
expected = dti[::2]._with_freq("infer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "infer" here should be unnecessary if dti already has a freq

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed _with_freq("infer")


@pytest.mark.parametrize("freq", ["YE", "ME", "D"])
def test_drop_method_freq_preservation(self, freq):
start = "1970-01-01"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a "# GH#???" reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added # GH 58846

# check if we need to set the freq attribute
from pandas import DatetimeIndex

if isinstance(self, DatetimeIndex) and isinstance(new_index, DatetimeIndex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only do this if self.freq is not None

from pandas import DatetimeIndex

if isinstance(self, DatetimeIndex) and isinstance(new_index, DatetimeIndex):
new_index.freq = to_offset(new_index.inferred_freq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferred_freq can be expensive. might be cheaper to invert indexer and call take, which has another PR to preserve freq which we can optimize

@jbrockmendel
Copy link
Member

im not convinced this is something we should do. this seems like a case where users shouldn't be using drop in the first place

@mcmrc
Copy link
Author

mcmrc commented Jun 5, 2024

I'm not convinced, either. In any case, I don't think it's desirable for drop to implicitly set freq to None. If DatetimeIndex has a freq, it should be retained, and if it is explicitly None, it should remain None.

Thank you so much again for your review.

@mcmrc mcmrc requested a review from jbrockmendel June 5, 2024 05:29
@mroeschke
Copy link
Member

Yeah I'm not convinced this complexity is worth adding. Thanks for the PR but going to close for now

@mroeschke mroeschke closed this Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: 2-sided inplace drop loses freq in DatetimeIndex
3 participants