Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] loc and iloc indexing with boolean mask does not handle invalid length correctly #13015

Closed
Tracked by #12793
wence- opened this issue Mar 27, 2023 · 0 comments · Fixed by #13402
Closed
Tracked by #12793
Assignees
Labels
bug Something isn't working improvement Improvement / enhancement to an existing function

Comments

@wence-
Copy link
Contributor

wence- commented Mar 27, 2023

Describe the bug

When the indexing array passed to iloc (or loc, which I don't like because it's not label-based in this case) is boolean, the idea is that it should select those rows for which the mask is true [NA values are treated as false, which AFAICT is an unnecessary stipulation because there's no way to construct a pandas boolean array with NA values in it].

If the length of the mask array does not match the length of the frame, we should obtain an IndexError. However, cudf produces an empty frame in this case. EDIT: I might have had an old cudf, so I can't reproduce, but cudf raises RuntimeError rather than IndexError here.

Steps/Code to reproduce bug

import pandas as pd
import cudf
import numpy as np

s = pd.Series([1, 2, 3])
s.iloc[np.asarray([0, 1], dtype="bool")] # => IndexError
c = cudf.from_pandas(s)
c.iloc[np.asarray([0, 1], dtype="bool")] # => RuntimeError

Expected behavior

Should probably match pandas.

@wence- wence- added bug Something isn't working Needs Triage Need team to review and classify labels Mar 27, 2023
@wence- wence- self-assigned this Mar 27, 2023
@wence- wence- added improvement Improvement / enhancement to an existing function and removed Needs Triage Need team to review and classify labels May 2, 2023
wence- added a commit to wence-/cudf that referenced this issue May 10, 2023
This is now purely location based, with no handling of multiindices in
the case of dataframes, and correct treatment of tuple arguments.

- Closes rapidsai#13015
- Closes rapidsai#13013
wence- added a commit to wence-/cudf that referenced this issue May 24, 2023
These must be treated specially and not accidentally converted to
integers before indexing.

- Closes rapidsai#13015
- Closes rapidsai#13265
- Closes rapidsai#13270
rapids-bot bot pushed a commit that referenced this issue May 25, 2023
These must be treated specially and not accidentally converted to integers before indexing.

- Closes #13015
- Closes #13265
- Closes #13270

Introduces tighter guards for loc-based indexing with Series where in some circumstances one must align the indices of the indexed object and the indexer. These now raise NotImplementedError rather than returning incorrect answers.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #13402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working improvement Improvement / enhancement to an existing function
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant