-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make describe
work for a grouped dataframe
#1443
Comments
I don't get how broadcasting works with grouped data frames. I was under the impression that
but |
Using
|
|
That's because |
RThe only solution I could find for R is to use a Simply calling
Man R loves lists. PandasPandas is weird.
But when you group it, the output becomes unreadable. It prints each variable horizontally.
ThoughtsReturning a grouped dataframe:
This is weird because the We could just print 3 DataFrames with the annotation |
Thanks for checking! dplyr is indeed a bit different from us, since the grouping information can be present without necessarily changing the behavior of many operations. Pandas' behavior makes sense, but the printing is terrible. How about returning a |
This is a good idea. And something I will work to implement. I think that there are some changes I want to make before this, since grouped operations can be expensive, see my comment in #1256. Without trying to re-optimize grouped operations just for this, it's worth changing summarize so that it only does the calculations it needs to, rather than calculating everything and just the statistics specified in |
I'm going to implement this:
Though it isn't much use if Grouped DataFrames don't print more information. They currently cutoff pretty quickly. However I am under the impression we will be changing grouped dataframes printing in the future. This behavior also might change with #1520 |
Thanks. Actually, I wonder whether we shouldn't make this the standard behavior of |
I think thats the right move for sure. Two things.
as the default behavior for any |
|
Regarding this issue, I'll update #1520 to keep |
This is the current output of
I think it could omit the printing of I think that printing info about the group might be tough though. The only info that a |
Yes, but you know that all values for grouping columns are by definition equal for a given group, so you can just take the first one. The result of |
Cool. Do you think you could bundle the printing PR with #1520? Or should we wait for that to be merged and see what to do. fwiw, |
I'd rather keep #1520 self-contained, printing is totally orthogonal to it. |
Sounds good when that is merged I will make a PR for printing of |
After #1520 and others,
This is pretty good. Having the result be a grouped dataframe feels intuitive. However
|
We could probably print the values of the grouping columns next to the group number. We could also try to print as many groups as possible on screen, but that wouldn't be enough in most cases I guess. Note that you can also do |
#1632 Addresses the general grouped printing function.
|
Now that #1632 has been merged, I wonder if we should change the way grouped DataFrames are printed in general. Should we default to showing 10 rows of however many groups we can? I'm not sure I see the benefit of showing the first and last groups only. |
Ideally I think we should print the header only once, using a fixed width across groups for a given column. That would free a lot of space, and then it would make sense to print as many groups as possible. |
That's what TexTables.jl does here with |
I've been playing around with It's still not pretty, maybe when metadata is added we can have something that controls printing to make this better?
|
We could improve |
Reporting this to remind myself to make this PR.
Since
GroupedDataFraemes
seem to support broadcasting, this should be easy.The text was updated successfully, but these errors were encountered: