Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG-20629 allow .at accessor with CategoricalIndex #26298

Merged
merged 10 commits into from
May 20, 2019

Conversation

JustinZhengBC
Copy link
Contributor

This PR alters exception handling in various get_value functions to make calls to .at on non-integer CategoricalIndex fall back on positional indexing methods instead of raising exceptions, allowing queries like those mentioned in #20629.

@codecov
Copy link

codecov bot commented May 6, 2019

Codecov Report

Merging #26298 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26298      +/-   ##
==========================================
- Coverage   91.98%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52374    52381       +7     
==========================================
+ Hits        48178    48181       +3     
- Misses       4196     4200       +4
Flag Coverage Δ
#multiple 90.53% <100%> (ø) ⬆️
#single 40.71% <0%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 98.62% <100%> (ø) ⬆️
pandas/core/frame.py 96.91% <100%> (-0.11%) ⬇️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94c8c94...1843a04. Read the comment docs.

@codecov
Copy link

codecov bot commented May 6, 2019

Codecov Report

Merging #26298 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26298      +/-   ##
==========================================
- Coverage   91.74%   91.74%   -0.01%     
==========================================
  Files         174      174              
  Lines       50748    50755       +7     
==========================================
+ Hits        46560    46563       +3     
- Misses       4188     4192       +4
Flag Coverage Δ
#multiple 90.25% <100%> (ø) ⬆️
#single 41.69% <38.46%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/category.py 98.62% <100%> (ø) ⬆️
pandas/core/frame.py 97.02% <100%> (-0.11%) ⬇️
pandas/_typing.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b99e84f...7b598c6. Read the comment docs.

except KeyError as e:
if self.index.nlevels > 1:
raise e # partial indexing forbidden
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this else surperflous?

@@ -2710,13 +2710,19 @@ def _get_value(self, index, col, takeable=False):

try:
return engine.get_value(series._values, index)
except KeyError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually don't need the e here (just do raise) is enough

@@ -2710,13 +2710,19 @@ def _get_value(self, index, col, takeable=False):

try:
return engine.get_value(series._values, index)
except KeyError as e:
if self.index.nlevels > 1:
raise e # partial indexing forbidden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the comment on a separate line before the raise

@@ -498,6 +498,8 @@ def get_value(self, series, key):
k = self._convert_scalar_indexer(k, kind='getitem')
indexer = self.get_loc(k)
return series.iloc[indexer]
except AttributeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what is the attribute error you are catching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above line, return series.iloc[indexer], assumes the series argument has implemented iloc. The regular index version does not make this assumption. In addition, the doc for this function reads "Fast lookup of value from 1-dimensional ndarray" so this assumption should be relaxed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,, I would then put the AttributeError catching with the Key and Type error below then.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key and type errors catch exceptions from self.get_loc, which usually happens when using indexing like s[s>5] leading to a key which is like a list of booleans.

The exceptions cannot use super.get_value(series, key) because its implementation does not work for CategoricalIndex when the key is a scalar value in the index (although it does work when the key is a positional index).

They also cannot all use super.get_value(series, indexer) because the creation of indexer is contingent on self.get_loc not failing.

Therefore, two cases are needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above line, return series.iloc[indexer], assumes the series argument has implemented iloc. The regular index version does not make this assumption.

ok so IIUC, series can actually be an Index or a Series? if so, then just change to using .take() I think should work. Can you try and type annotate things (and update the doc-strings).

I don't think we would need the AttributeError catching then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've applied your recommended changes and added type information (the case where series is an ndarray was actually the case in the original issue)

@@ -719,3 +719,11 @@ def test_map_with_dict_or_series(self):
output = cur_index.map(mapper)
# Order of categories in output can be different
tm.assert_index_equal(expected, output)

def test_at_with_categorical_index(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test both loc and at with here


def test_at_with_categorical_index(self):
# GH 20629
s = Series([1, 2, 3], index=pd.CategoricalIndex(["A", "B", "C"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this after the loc tests

@jreback jreback added Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels May 7, 2019
@jreback
Copy link
Contributor

jreback commented May 7, 2019

this might be solved as well: #14865

@JustinZhengBC
Copy link
Contributor Author

This PR does not solve #14865 (although I'd be happy to look into it). There is still an issue with Series.get where if the key is not found in the index but can be interpreted to be a positional indexer, then it will be (for example, s.get(0) returns the first element when s has a CategoricalIndex that does not contain a 0) which is inconsistent which regular indexes.

@JustinZhengBC
Copy link
Contributor Author

@jreback I have made the requested changes

@jreback jreback added this to the 0.25.0 milestone May 16, 2019
@@ -488,16 +490,31 @@ def get_loc(self, key, method=None):
except KeyError:
raise KeyError(key)

def get_value(self, series, key):
def get_value(self,
series: Union[ABCSeries, ExtensionArray, Index, np.ndarray],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this type to pandas._typing, call it AnyArrayLike

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typing comment, ping on green.

@JustinZhengBC
Copy link
Contributor Author

@jreback green

@jreback
Copy link
Contributor

jreback commented May 19, 2019

lgtm. can you merge master and ping on green.

@jreback jreback merged commit 2b32e41 into pandas-dev:master May 20, 2019
@jreback
Copy link
Contributor

jreback commented May 20, 2019

thanks @JustinZhengBC keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.at and DataFrame.at crash with CategoricalIndex
2 participants