Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show group values when printing grouped dataframe #1632

Merged
merged 9 commits into from
Dec 26, 2018

Conversation

pdeffebach
Copy link
Contributor

In reference to #1539 this PR now prints the values of the grouping column whenever it prints a grouped dataframe.

@pdeffebach
Copy link
Contributor Author

Now the printing looks like this (note the :d => 1) etc.

df = DataFrame(a = repeat(1:4, outer = 5), b = randn(20), c = randn(20) .+ 1)
g = groupby(df, :a)
julia> g
GroupedDataFrame{DataFrame} with 4 groups based on key: :d
First Group: 5 rows
:d => 1
│ Row │ a     │ b        │ c         │ d     │
│     │ Int64 │ Float64  │ Float64   │ Int64 │
├─────┼───────┼──────────┼───────────┼───────┤
│ 1   │ 1     │ 0.365603 │ -0.164928 │ 1     │
│ 2   │ 1     │ 1.78441  │ 1.67254   │ 1     │
│ 3   │ 1     │ -1.4079  │ 0.0170639 │ 1     │
│ 4   │ 1     │ 0.421552 │ 1.20265   │ 1     │
│ 5   │ 1     │ -1.12468 │ 0.783903  │ 1     │
First Group: 5 rows
:d => 4
│ Row │ a     │ b         │ c        │ d     │
│     │ Int64 │ Float64   │ Float64  │ Int64 │
├─────┼───────┼───────────┼──────────┼───────┤
│ 1   │ 4     │ 0.244776  │ 1.30593  │ 4     │
│ 2   │ 4     │ 1.49806   │ 0.43346  │ 4     │
│ 3   │ 4     │ -0.372229 │ 1.52525  │ 4     │
│ 4   │ 4     │ 0.845078  │ 1.19766  │ 4     │
│ 5   │ 4     │ -0.668419 │ 0.286902 │ 4     │

@nalimilan
Copy link
Member

I'd print this on the same line as the number of rows, maybe like this: First Group (5 rows): col=value (with a comma between multiple keys).

You'll also need to update tests.

@pdeffebach
Copy link
Contributor Author

Let me know if you like this behavior (though the code can change obviously) and I will add tests

@@ -6,31 +6,55 @@ function Base.show(io::IO, gd::GroupedDataFrame;
rowlabel::Symbol = :Row,
summary::Bool = true)
N = length(gd)
keys = join(':' .* string.(names(gd.parent)[gd.cols]), ", ")
grouped_names = names(gd.parent)[gd.cols]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grouped_names = names(gd.parent)[gd.cols]
keynames = names(gd.parent)[gd.cols]

print(io, "\nGroup $i: $nrows $rows")

identified_groups = [':' * string(parent_names[col], " = ",
first(gd[i][col])) for col in gd.cols]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation. Please also remove trailing spaces (here and elsewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any examples of the right indentation for constructors. So I added more line breaks? Let me know the course of action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a clear rule for comprehensions, but here first is inside string, so it should definitely not be aligned with the bracket. I'd do this:

Suggested change
first(gd[i][col])) for col in gd.cols]
identified_groups = [':' * string(parent_names[col], " = ", first(gd[i][col]))
for col in gd.cols]


print(io, "\nGroup $i ($nrows $rows): ")
join(io, identified_groups, ", ", " and ")

show(io, gd[i], summary=false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't print the grouping columns, since they're listed above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it reads well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you like. But just to make sure we're talking about the same thing: I was suggesting we could drop the grouping columns from the data frame we print below, since all values are equal within a given group.

print(io, "\nGroup $i: $nrows $rows")

identified_groups = [':' * string(parent_names[col], " = ",
first(gd[i][col])) for col in gd.cols]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a clear rule for comprehensions, but here first is inside string, so it should definitely not be aligned with the bracket. I'd do this:

Suggested change
first(gd[i][col])) for col in gd.cols]
identified_groups = [':' * string(parent_names[col], " = ", first(gd[i][col]))
for col in gd.cols]


print(io, "\nGroup $i ($nrows $rows): ")
join(io, identified_groups, ", ", " and ")

show(io, gd[i], summary=false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you like. But just to make sure we're talking about the same thing: I was suggesting we could drop the grouping columns from the data frame we print below, since all values are equal within a given group.

@@ -1,54 +0,0 @@
```@meta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be removed.

@pdeffebach
Copy link
Contributor Author

Sorry about deleting that file, i did it on accident and I thought my git reset ... HEAD worked.

We should show the grouping columns, so that gd[i][1] behaves according to what the user sees in printing.

@pdeffebach
Copy link
Contributor Author

Okay I think this is ready to be merged.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nalimilan nalimilan merged commit adc1043 into JuliaData:master Dec 26, 2018
keystr = length(gd.cols) > 1 ? "keys" : "key"
groupstr = N > 1 ? "groups" : "group"
summary && print(io, "$(typeof(gd)) with $N $groupstr based on $keystr: $keys")
if allgroups
for i = 1:N
nrows = size(gd[i], 1)
rows = nrows > 1 ? "rows" : "row"
print(io, "\nGroup $i: $nrows $rows")

identified_groups = [':' * string(parent_names[col], " = ", first(gd[i][col]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized that when the grouping column is a string or symbol, it's printed as e.g. :x = a rather than :x = "a" or :x = :a. I guess we should change this (using repr or ourshowcompact)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be repr. I will make a quick PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants