Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Don't error with empty Series for .isin #17006

Merged
merged 1 commit into from
Jul 19, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -204,3 +204,4 @@ Categorical
Other
^^^^^
- Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
2 changes: 2 additions & 0 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ def _ensure_data(values, dtype=None):

# we check some simple dtypes first
try:
if is_object_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check perf on some of the algos
esp isin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any noticeable perf degradations on my machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you run the asv? this can degrade as its in a critical path. notice there is already a check for object type later on. This function receives LOTS of input.

Copy link
Member Author

@gfyoung gfyoung Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite: the existing check is for the values. This is for the dtype specified. This is essentially an O(1) operation.

And yes, I did check performance (see my comment above), and I didn't see any issues on my machine.

return _ensure_object(np.asarray(values)), 'object', 'object'
if is_bool_dtype(values) or is_bool_dtype(dtype):
# we are actually coercing to uint64
# until our algos suppport uint8 directly (see TODO)
Expand Down
9 changes: 6 additions & 3 deletions pandas/tests/frame/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1151,10 +1151,13 @@ def test_isin(self):
expected = DataFrame([df.loc[s].isin(other) for s in df.index])
tm.assert_frame_equal(result, expected)

def test_isin_empty(self):
@pytest.mark.parametrize("empty", [[], Series(), np.array([])])
def test_isin_empty(self, empty):
# see gh-16991
df = DataFrame({'A': ['a', 'b', 'c'], 'B': ['a', 'e', 'f']})
result = df.isin([])
expected = pd.DataFrame(False, df.index, df.columns)
expected = DataFrame(False, df.index, df.columns)

result = df.isin(empty)
tm.assert_frame_equal(result, expected)

def test_isin_dict(self):
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1407,6 +1407,15 @@ def check_idx(idx):
# Float64Index overrides isin, so must be checked separately
check_idx(Float64Index([1.0, 2.0, 3.0, 4.0]))

@pytest.mark.parametrize("empty", [[], Series(), np.array([])])
def test_isin_empty(self, empty):
# see gh-16991
idx = Index(["a", "b"])
expected = np.array([False, False])

result = idx.isin(empty)
tm.assert_numpy_array_equal(expected, result)

def test_boolean_cmp(self):
values = [1, 2, 3, 4]

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/series/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1135,6 +1135,15 @@ def test_isin_with_i8(self):
result = s.isin(s[0:2])
assert_series_equal(result, expected)

@pytest.mark.parametrize("empty", [[], Series(), np.array([])])
def test_isin_empty(self, empty):
# see gh-16991
s = Series(["a", "b"])
expected = Series([False, False])

result = s.isin(empty)
tm.assert_series_equal(expected, result)

def test_timedelta64_analytics(self):
from pandas import date_range

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,15 @@ def test_categorical_from_codes(self):
result = algos.isin(Sd, St)
tm.assert_numpy_array_equal(expected, result)

@pytest.mark.parametrize("empty", [[], pd.Series(), np.array([])])
def test_empty(self, empty):
# see gh-16991
vals = pd.Index(["a", "b"])
expected = np.array([False, False])

result = algos.isin(vals, empty)
tm.assert_numpy_array_equal(expected, result)


class TestValueCounts(object):

Expand Down