Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug issue 16819 Index.get_indexer_not_unique inconsistent return types vs get_indexer #16826

Merged
merged 8 commits into from
Jul 6, 2017
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ Performance Improvements

Bug Fixes
~~~~~~~~~
- Bug in get_indexer_non_unique inconsistent return type with get_indexer (:issue:`16819`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index.get_indexer_non_unique() now returns a ndarray indexer rather than an Index; this is consistent with Index.get_indexer().

put in API breaking changes section.


Conversion
^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2704,7 +2704,7 @@ def get_indexer_non_unique(self, target):
tgt_values = target._values

indexer, missing = self._engine.get_indexer_non_unique(tgt_values)
return Index(indexer), missing
return indexer, missing

def get_indexer_for(self, target, **kwargs):
"""
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1131,6 +1131,17 @@ def test_get_indexer_strings(self):
with pytest.raises(TypeError):
idx.get_indexer(['a', 'b', 'c', 'd'], method='pad', tolerance=2)

def test_get_indexer_consistency(self):
# See GH 16819
for name, index in self.indices.items():
indexer = index.get_indexer(index[0:2])
assert isinstance(indexer, np.ndarray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

expected = np.array([0, 1, 2], dtype=np.intp)
tm.assert_numpy_array_equal(indexer, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls update to use this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue with that is that some of the indexes are empty or categorical indexes so not unique positions so can't assume that [0, 1, 2] are returned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

assert indexer.dtype == np.intp

indexer, _ = index.get_indexer_non_unique(index[0:2])
assert isinstance(indexer, np.ndarray)
assert indexer.dtype == np.intp

def test_get_loc(self):
idx = pd.Index([0, 1, 2])
all_methods = [None, 'pad', 'backfill', 'nearest']
Expand Down
3 changes: 1 addition & 2 deletions pandas/tests/indexes/test_category.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,8 +386,7 @@ def test_reindexing(self):
expected = oidx.get_indexer_non_unique(finder)[0]

actual = ci.get_indexer(finder)
tm.assert_numpy_array_equal(
expected.values, actual, check_dtype=False)
tm.assert_numpy_array_equal(expected, actual, check_dtype=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_dtype is True by default


def test_reindex_dtype(self):
c = CategoricalIndex(['a', 'b', 'c', 'a'])
Expand Down