Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix wrong error in df drop with non unique datetime index and invalid keys #30446

Merged
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 +714,8 @@ Datetimelike
- Bug in :func:`pandas.to_datetime` failing for `deques` when using ``cache=True`` (the default) (:issue:`29403`)
- Bug in :meth:`Series.item` with ``datetime64`` or ``timedelta64`` dtype, :meth:`DatetimeIndex.item`, and :meth:`TimedeltaIndex.item` returning an integer instead of a :class:`Timestamp` or :class:`Timedelta` (:issue:`30175`)
- Bug in :class:`DatetimeIndex` addition when adding a non-optimized :class:`DateOffset` incorrectly dropping timezone information (:issue:`30336`)
- Bug in :meth:`pandas.core.indexes.base.Index.get_indexer_non_unique` missing condition on target.is_all_date before trying to convert target to asi8 values, which results in wrong error message when dropping with non-unique datetime index (:issue:`30399`)
fujiaxiang marked this conversation as resolved.
Show resolved Hide resolved


Timedelta
^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4551,7 +4551,7 @@ def get_indexer_non_unique(self, target):

if is_categorical(target):
tgt_values = np.asarray(target)
elif self.is_all_dates:
elif self.is_all_dates and target.is_all_dates: # GH 30399
tgt_values = target.asi8
else:
tgt_values = target._ndarray_values
Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/indexes/multi/test_drop.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,32 @@ def test_drop_not_lexsorted():
tm.assert_index_equal(lexsorted_mi, not_lexsorted_mi)
with tm.assert_produces_warning(PerformanceWarning):
tm.assert_index_equal(lexsorted_mi.drop("a"), not_lexsorted_mi.drop("a"))


def test_drop_with_non_unique_datetime_index_and_invalid_keys():
# GH 30399

# define dataframe with unique datetime index
df_unique = pd.DataFrame(
np.random.randn(5, 3),
columns=["a", "b", "c"],
index=pd.date_range("2012", freq="H", periods=5),
)
# create dataframe with non-unique datetime index
df_nonunique = df_unique.copy().iloc[[0, 2, 2, 3]]

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of this is the raise a KeyError for the example specified - can you model the test to use pytest.raises instead? Should see other examples of this in the code base

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the code as requested. Now the test specifically expects KeyError with string "not found in axis" within the message

df_nonunique.drop(["a", "b"]) # Dropping with labels not exist in the index
except Exception as e:
result = e
else:
result = "df_nonunique.drop(['a', 'b']) should raise error but it didn't"

try:
df_unique.drop(["a", "b"]) # Dropping with labels not exist in the index
except Exception as e:
expected = e
else:
expected = "df_unique.drop(['a', 'b']) should raise error but it didn't"

assert type(result) is type(expected) and result.args == expected.args