-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix regression for groupby.indices in case of unused categories #38649
Conversation
@@ -241,6 +242,11 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0): | |||
@cache_readonly | |||
def indices(self): | |||
""" dict {group name -> group indices} """ | |||
if len(self.groupings) == 1 and isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woa, this is so special casing here. why is this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the explanation in the top post
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read it and don't buy it. Why are we supporting this case at all? I would rather simply fix this breaking or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean not showing unused categories for indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c we did thave this special case before, but onl was for len(self.groupins) == 1. its the special isinstance i object.
@phofl Categorical does actually support NaNs, but maybe this is not covered by the tests? |
It seems this might actually also be broken for Categorical (or not fixed by #36842). This is what I see on master:
|
Hm, I misunderstood #35646 (comment) this then. Have to look into this then |
Ah, no, I was not aware of that categorical with BTW, if this is the case, the docs of the |
@@ -241,6 +242,11 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0): | |||
@cache_readonly | |||
def indices(self): | |||
""" dict {group name -> group indices} """ | |||
if len(self.groupings) == 1 and isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c we did thave this special case before, but onl was for len(self.groupins) == 1. its the special isinstance i object.
My comment was assuming that that adding a new |
not happening, we already removed this case thru a deprecation cycle and no reason to add it back |
Categorical indeed does not support NaN as a category, but does that mean it could not be supported in |
I imagine it should be supported in At the time I was probably focusing too much on |
@phofl can you merge master and add a note for 1.2.1 |
Done |
thanks @phofl very nice |
@meeseeksdev backport 1.2.x |
… in case of unused categories
…f unused categories (#38790) Co-authored-by: patrick <61934744+phofl@users.noreply.github.com>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
To show the unused we categories, we can dispatch back as we did before #36842. Since Categorical does not support nan, the reason for the switch does not exist in this case. I think handling unused categories in
get_indexer_dict
is not desirable, because it would introduce a lot of convolution for one corner case we could handle here pretty easily.Since we are only using
indices
in grouper only for categoricals, we do not have to worry about sorting as the test shows.cc @jorisvandenbossche @mroeschke