-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG-20629 allow .at accessor with CategoricalIndex #26298
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26298 +/- ##
==========================================
- Coverage 91.98% 91.98% -0.01%
==========================================
Files 175 175
Lines 52374 52381 +7
==========================================
+ Hits 48178 48181 +3
- Misses 4196 4200 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26298 +/- ##
==========================================
- Coverage 91.74% 91.74% -0.01%
==========================================
Files 174 174
Lines 50748 50755 +7
==========================================
+ Hits 46560 46563 +3
- Misses 4188 4192 +4
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
except KeyError as e: | ||
if self.index.nlevels > 1: | ||
raise e # partial indexing forbidden | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this else surperflous?
pandas/core/frame.py
Outdated
@@ -2710,13 +2710,19 @@ def _get_value(self, index, col, takeable=False): | |||
|
|||
try: | |||
return engine.get_value(series._values, index) | |||
except KeyError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually don't need the e here (just do raise
) is enough
pandas/core/frame.py
Outdated
@@ -2710,13 +2710,19 @@ def _get_value(self, index, col, takeable=False): | |||
|
|||
try: | |||
return engine.get_value(series._values, index) | |||
except KeyError as e: | |||
if self.index.nlevels > 1: | |||
raise e # partial indexing forbidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put the comment on a separate line before the raise
pandas/core/indexes/category.py
Outdated
@@ -498,6 +498,8 @@ def get_value(self, series, key): | |||
k = self._convert_scalar_indexer(k, kind='getitem') | |||
indexer = self.get_loc(k) | |||
return series.iloc[indexer] | |||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, what is the attribute error you are catching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above line, return series.iloc[indexer]
, assumes the series
argument has implemented iloc
. The regular index version does not make this assumption. In addition, the doc for this function reads "Fast lookup of value from 1-dimensional ndarray" so this assumption should be relaxed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,, I would then put the AttributeError catching with the Key and Type error below then.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key and type errors catch exceptions from self.get_loc
, which usually happens when using indexing like s[s>5]
leading to a key which is like a list of booleans.
The exceptions cannot use super.get_value(series, key)
because its implementation does not work for CategoricalIndex
when the key is a scalar value in the index (although it does work when the key is a positional index).
They also cannot all use super.get_value(series, indexer)
because the creation of indexer
is contingent on self.get_loc
not failing.
Therefore, two cases are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above line, return series.iloc[indexer], assumes the series argument has implemented iloc. The regular index version does not make this assumption.
ok so IIUC, series
can actually be an Index
or a Series
? if so, then just change to using .take()
I think should work. Can you try and type annotate things (and update the doc-strings).
I don't think we would need the AttributeError catching then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I've applied your recommended changes and added type information (the case where series
is an ndarray
was actually the case in the original issue)
@@ -719,3 +719,11 @@ def test_map_with_dict_or_series(self): | |||
output = cur_index.map(mapper) | |||
# Order of categories in output can be different | |||
tm.assert_index_equal(expected, output) | |||
|
|||
def test_at_with_categorical_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test both loc and at with here
|
||
def test_at_with_categorical_index(self): | ||
# GH 20629 | ||
s = Series([1, 2, 3], index=pd.CategoricalIndex(["A", "B", "C"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this after the loc tests
this might be solved as well: #14865 |
This PR does not solve #14865 (although I'd be happy to look into it). There is still an issue with |
@jreback I have made the requested changes |
pandas/core/indexes/category.py
Outdated
@@ -488,16 +490,31 @@ def get_loc(self, key, method=None): | |||
except KeyError: | |||
raise KeyError(key) | |||
|
|||
def get_value(self, series, key): | |||
def get_value(self, | |||
series: Union[ABCSeries, ExtensionArray, Index, np.ndarray], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this type to pandas._typing, call it AnyArrayLike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor typing comment, ping on green.
@jreback green |
lgtm. can you merge master and ping on green. |
thanks @JustinZhengBC keep em coming! |
git diff upstream/master -u -- "*.py" | flake8 --diff
This PR alters exception handling in various
get_value
functions to make calls to.at
on non-integerCategoricalIndex
fall back on positional indexing methods instead of raising exceptions, allowing queries like those mentioned in #20629.