-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby with index = False returns NANs when column is categorical. #13204
Comments
pls post an example & show_versions. SO links are nice, but an in-line example much better. |
FWIW my example would be something like
|
yeah, this is reindexing I think somewhere inside and is prob not setting it up right. pull-requests welcome. |
Looks quite easy to fix. Function Another issue in the same function. The multiindex loses information about dtypes. For example: df = pd.DataFrame({'cat': pd.Categorical([5,6,6,7,7], [5,6,7,8]),
'i1' : [10, 11, 11, 10, 11],
'i2' : [101,102,102,102,103]})
df.groupby(['cat', 'i1']).sum().reset_index().dtypes
Out[12]:
cat int64
i1 int64
i2 float64
dtype: object While for a usual one level index: df.groupby(['cat']).sum().reset_index().dtypes
Out[13]:
cat category
i1 float64
i2 float64
dtype: object And I guess Edit: On second thought, I'd rather leave the index as it is. If a change is needed, it'd better be done in MultiIndex constructor, I suppose. I'll prepare a PR for it later. BTW, I couldn't find any info whether the following behaviour of categoricals in DataFrame is by design or just a side effect: # df - same as above
df.sum()
Out[14]:
cat 31.0
i1 53.0
i2 510.0
dtype: float64
df[['cat']].sum()
Out[15]:
cat 31
dtype: int64
# while for Series:
df['cat'].sum()
...
TypeError: Categorical cannot perform the operation sum Shouldn't categricals be rather excluded when aggregating as it is with datetime columns? |
…dev#13204 BUG: Fix string repr of Grouping
Please see stackoverflow for example of issue
http://stackoverflow.com/questions/37279260/why-doesnt-pandas-allow-a-categorical-column-to-be-used-in-groupby?noredirect=1#comment62084780_37279260
The text was updated successfully, but these errors were encountered: