-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: GroupBy.indices no longer includes unobserved categories #38642
Comments
So for example, those two APIs still return all values (both for pandas 1.0 and master):
So it seems |
This may have been unintentionally changed by me in https://github.com/pandas-dev/pandas/pull/36911/files |
I looked into this, I think the new case is more consistent maybe?
returned
While a one dimensional group key returned what you showed above. The missing categories case would be tricky to handle with multidimensional keys. Maybe it would be better to remove unused categories from groups too? Or should the one-dimensional case be special here? |
@mroeschke no that was not the reason. I think this was caused by c4226d4 |
Thanks for confirming @phofl |
Addition: We are no longer running through there since #36842 |
@phofl thanks for looking at it!
Indeed for multiple keys, we seem to not include unobserved categories. But, here both So to fully make it consistent, then for example also |
Passing >>>df = pd.DataFrame({"key": pd.Categorical(["b"]*5, categories=["a", "b", "c", "d"]), "col": range(5)})
>>> gb = df.groupby("key", observed=True)
>>> list(gb.indices)
['b']
>>> gb = df.groupby("key", observed=False)
>>> list(gb.indices)
['a', b', 'c', 'd'] |
Have to correct myself, this was changed by #36842 @jorisvandenbossche When testing this on 1.1.0 and 1.1.5 I get
both times. Edit: Changed the example a bit.
|
I think the pointer of @mroeschke to #36911 might be more correct, since that was a PR for 1.1.4, while #36842 only for 1.2.0. And unlike what I said earlier (I thought it was only working on 1.0, and not in 1.1.x), this actually only changed from 1.1.3 to 1.1.4. |
can confirm, first bad commit: [345efdd] BUG: RollingGroupby not respecting sort=False (#36911) |
Hm just looked at the pr numbers, not when they were merged. Nevertheless, we have to change both commits to get the original result, because the code path from #36911 is currently not used on master. |
Does anybody know if this was an intentional change? (I don't directly find something about it in the whatsnew)
vs
This already changed in pandas 1.1, so not a recent change.
The consequence of this is that iterating over
gb
vs iterating overgb.indices
is not consistent anymore.cc @mroeschke @rhshadrach
The text was updated successfully, but these errors were encountered: