Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved REPL printing for GroupedDataFrames #3107

Merged
merged 9 commits into from
Sep 29, 2022
Merged

Conversation

Jollywatt
Copy link
Contributor

@Jollywatt Jollywatt commented Jul 28, 2022

This is a small enhancement to make GroupedDataFrames display so that they fit in one screen, like regular DataFrames with many rows.

Before, the first and last groups of a GroupedDataFrame would be displayed at full size; now, the two groups are made smaller (using the :displaysize parameter in IOContext) so that together they fill the required line height.

If the first group is smaller than half the available REPL height, and the last group is larger, or vice versa, then the smaller group is shown in full and the larger one is squashed (displayed with missing rows). This means that the group with the most rows takes up the most room visually, even after some of its rows are skipped.

See the "GroupedDataFrame displaysize test" test set for example output.

@bkamins bkamins added this to the 1.4 milestone Jul 28, 2022
@bkamins
Copy link
Member

bkamins commented Jul 28, 2022

Makes sense. I will wait for @ronisbr to review first as he is maintaining all display stuff 😄.

@bkamins bkamins requested a review from ronisbr July 28, 2022 12:36
@bkamins
Copy link
Member

bkamins commented Sep 16, 2022

@ronisbr - could you please review this PR (especially in terms of how it interacts with PrettyTables.jl kwargs) while we are waiting for HTML PR to be merged. Then I will review it. Thank you!

@ronisbr
Copy link
Member

ronisbr commented Sep 16, 2022

Sorry! I completely missed this one.

Copy link
Member

@ronisbr ronisbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me! I would just add tests for corner cases in displaysize. For example, if we are printing to a file, dislaysize will take the values from LINES and COLUMNS:

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> ENV["LINES"] = -1
-1

julia> displaysize(io)
(-1, 80)

In this case, I suppose the data frames will be printed without limits given how show works. However, it will important to have tests for those cases with invalid options.

Another corner case that I always have problems when dealing with limited display is when the display is just too small. What will happen if the display here has only 3 lines? Will the algorithm break at some point? I think h = div(v, 2) can return a number equal or lower than 0, which will print the entire DataFrame. Can you check this please?

Jollywatt added a commit to Jollywatt/DataFrames.jl that referenced this pull request Sep 16, 2022
@Jollywatt
Copy link
Contributor Author

Jollywatt commented Sep 16, 2022

Hi @ronisbr, thanks for the comment!
I made sure the printing obeys “print squashed if h is small but positive, but print everything if h is invalid (zero or negative)” and added tests. (This seems to be how ungrouped data frames display.) Previously, h = 1 and h = 2 were edge cases where one group would be printed in full but not the other — good catch!

@bkamins
Copy link
Member

bkamins commented Sep 18, 2022

@Jollywatt - unfortunately due to Tables.jl 1.8 release we needed to make some changes in the internals of DataFrames.jl. Can you please rebase (or merge if rebasing is problematic) this PR against main branch as otherwise CI will fail for this PR.

print(io, "\nFirst Group ($nrows $rows): ")
join(io, identified_groups, ", ")
println(io)
(h, w) = get(io, :displaysize, displaysize(io))
Copy link
Member

@bkamins bkamins Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronisbr - do you use the same pattern in PrettyTables.jl? (to ensure consistency)

@Jollywatt - should not we use the get(io, :limit, false) test first and only implement the logic you propose if the :limit property is set? And add tests for both cases.

Note the following for standard printing of data frames:

julia> df = DataFrame(rand(20, 5), :auto) # display sets limit
20×5 DataFrame
 Row │ x1        x2         x3          x4         x5
     │ Float64   Float64    Float64     Float64    Float64
─────┼───────────────────────────────────────────────────────
   1 │ 0.799571  0.711126   0.0829037   0.153365   0.65724
   2 │ 0.013505  0.516158   0.289836    0.772033   0.236464
   3 │ 0.539347  0.866731   0.527391    0.763274   0.445696
  ⋮  │    ⋮          ⋮          ⋮           ⋮          ⋮
  18 │ 0.285917  0.83837    0.412981    0.283734   0.734432
  19 │ 0.539428  0.281714   0.503087    0.236329   0.980165
  20 │ 0.19463   0.818894   0.080396    0.274849   0.678159
                                              14 rows omitted

julia> show(stdout, df) # stdout does not have limit
20×5 DataFrame
 Row │ x1        x2         x3          x4         x5
     │ Float64   Float64    Float64     Float64    Float64
─────┼────────────────────────────────────────────────────────
   1 │ 0.799571  0.711126   0.0829037   0.153365   0.65724
   2 │ 0.013505  0.516158   0.289836    0.772033   0.236464
   3 │ 0.539347  0.866731   0.527391    0.763274   0.445696
   4 │ 0.220234  0.94367    0.740741    0.141462   0.121854
   5 │ 0.32305   0.267344   0.519681    0.331716   0.396574
   6 │ 0.127974  0.282572   0.640234    0.841006   0.526737
   7 │ 0.322138  0.925825   0.00683361  0.0276174  0.140838
   8 │ 0.832967  0.744335   0.808838    0.516439   0.219536
   9 │ 0.895046  0.853179   0.487567    0.81502    0.721893
  10 │ 0.236999  0.481331   0.310607    0.0200533  0.00837921
  11 │ 0.142544  0.310369   0.121286    0.781769   0.0948759
  12 │ 0.923275  0.205102   0.097269    0.806569   0.273017
  13 │ 0.177222  0.266212   0.965611    0.345097   0.745195
  14 │ 0.157075  0.851299   0.121473    0.281857   0.1338
  15 │ 0.938703  0.183045   0.769088    0.562905   0.0955595
  16 │ 0.592258  0.387368   0.912301    0.292013   0.401606
  17 │ 0.949321  0.0872963  0.421829    0.0586528  0.39613
  18 │ 0.285917  0.83837    0.412981    0.283734   0.734432
  19 │ 0.539428  0.281714   0.503087    0.236329   0.980165
  20 │ 0.19463   0.818894   0.080396    0.274849   0.678159

julia> show(IOContext(stdout, :limit => true), df) # but you can manually enable limit
20×5 DataFrame
 Row │ x1        x2         x3          x4         x5
     │ Float64   Float64    Float64     Float64    Float64
─────┼───────────────────────────────────────────────────────
   1 │ 0.799571  0.711126   0.0829037   0.153365   0.65724
   2 │ 0.013505  0.516158   0.289836    0.772033   0.236464
   3 │ 0.539347  0.866731   0.527391    0.763274   0.445696
  ⋮  │    ⋮          ⋮          ⋮           ⋮          ⋮
  18 │ 0.285917  0.83837    0.412981    0.283734   0.734432
  19 │ 0.539428  0.281714   0.503087    0.236329   0.980165
  20 │ 0.19463   0.818894   0.080396    0.274849   0.678159
                                              14 rows omitted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronisbr - do you use the same pattern in PrettyTables.jl? (to ensure consistency)

Yes, kind of. The idea is the same, but I check for :limit if the user did not pass a specific keyword to crop the output table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant? displaysize(io) already extracts the :displaysize property of io if it's set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does not have to, see e.g. https://github.com/JuliaLang/julia/blob/d5cde865f24d2c3e7041aee8ea464eb6b6045a2a/base/stream.jl#L564 (and custom targets could define similar rules).

OTOH, I do not see a situation when what is proposed now would be incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither DataFrames nor PrettyTables use that pattern AFAICT. Julia Base doesn't. So better be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrames.jl uses this pattern in

tty_rows, tty_cols = get(io, :displaysize, displaysize(io))

Blame shows that I added it 3 years ago in #1761. The reason why it was added is #1761 (comment) (you had the same comment there as here). @nalimilan - so what do we do?

@Jollywatt
Copy link
Contributor Author

@bkamins I hope I did that right — I synced my fork using the Github UI, which made a merge, not a rebase.

@bkamins
Copy link
Member

bkamins commented Sep 18, 2022

merge should be OK.

test/show.jl Show resolved Hide resolved
@@ -44,7 +44,7 @@ function Base.show(io::IO, gd::GroupedDataFrame;
h -= 2 # two lines are already used for header and gap between groups

h1 = h2 = h # display heights available for first and last groups
if N > 1
if !allrows && N > 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronisbr - there is also an additional kwarg in PrettyTables.jl that governs number of rows to be printed in a table. Can you please remind it its name? We should check how passing it interacts with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One for sure I now see is display_size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should point out that change to line 47 doesn't change any behaviour: arguments like allrows get forwarded to show methods later on. (All !allrows && … does is save a few arithmetic operations in the case where we know that we don't need to calculate heights.) So if there are other kwargs that override allrows, then it might be better to always set the default display_size like before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I did not suggest to make a change to check allrows, but rather to check get(io, :limit, false). However, looking at the code - maybe it is OK, as if :limit is false then display size will be ignored in show for groups. Just please confirm that we have a test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, does :limit take precedence over allrows? I have added a single test for whether allrows=true works even when display size is small, but don't have tests where I set :limit directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does :limit take precedence over allrows?

It depends how you define precedence. allrows will override :limit if allrows is set to true.
However, you can have allrows=false and :limit=false. We should make sure that even if we pass custom display_size to show internally all works OK (I think it will, but let us make sure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronisbr - there is also an additional kwarg in PrettyTables.jl that governs number of rows to be printed in a table. Can you please remind it its name? We should check how passing it interacts with this PR.

You can select the number of rows that will be printed if crop is set to :both or :vertical. In this case, the number of printed rows (including header, table dividers, etc) are selected based on the display size.

@bkamins
Copy link
Member

bkamins commented Sep 18, 2022

In summary - the PR mostly looks good (it works OK under standard settings). However, please carefully consider :limit setting of IOContext and possible kwargs that can be passed to PrettyTables.jl so make sure we do not hit some problem in such cases.

@bkamins
Copy link
Member

bkamins commented Sep 20, 2022

Also documentation needs to be updated to reflect the changes rules of printing of GroupedDataFrame.

@ronisbr
Copy link
Member

ronisbr commented Sep 20, 2022

From my side, I do not see problems related to kwargs in PrettyTables. The arguments allrows and allcols are converted correctly to the one in PrettyTables that limit the number of rows and columns to be printed. Of course, the user can break it badly by passing strange arguments to show, but this is the side effect of the power to customize the output.

@bkamins
Copy link
Member

bkamins commented Sep 20, 2022

OK - so we can drop !allrows check then. @Jollywatt - can you then please remove it + update documentation examples so that documenter passes correctly?

@bkamins
Copy link
Member

bkamins commented Sep 21, 2022

@nalimilan - do you have time to have a look at this PR (and as usual spot the tiny details that I usually miss; thank you!)

src/groupeddataframe/show.jl Outdated Show resolved Hide resolved
src/groupeddataframe/show.jl Outdated Show resolved Hide resolved
test/show.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Sep 24, 2022

@Jollywatt - do you have time to have a look at this PR and finalize it. Thank you!

Soon we will make a feature freeze for DataFrames.jl 1.4 release (after this - the PR would still be merged but would be released later).

@Jollywatt
Copy link
Contributor Author

@bkamins Ok, done (I think)! Let me know if there’s anything else to do:)

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are failing

test/show.jl Outdated Show resolved Hide resolved
@Jollywatt
Copy link
Contributor Author

tests are failing

I’ve been trying to debug those failures but haven’t gotten to the bottom of them. It seems like PrettyTables got updated?

@bkamins
Copy link
Member

bkamins commented Sep 26, 2022

docs build is failing.

@bkamins
Copy link
Member

bkamins commented Sep 26, 2022

Regarding failing CI:

@ronisbr - this is a bug in PrettyTables.jl. You use @inbounds in several places in the code, so this does not get detected in normal usage, but tests catch it (as they force bounds checking).

The offending operation is:

num_lines_in_row[1:num_header_rows]

where in the test num_header_rows is 2, while num_lines_in_row is 1.

This is in _count_header_lines function.

The issue is most likely some non standard table printing heights that get passed in this PR to PrettyTables.jl.

So the fix should have two parts:

  • personally - I would remove @inbounds here as it seems your code does not ensure that @inbounds is safe (so it should not be used; also - most likely it does not save much time - but you will know better); (in general please consider in whole PrettyTables.jl if you indeed need to use @inbounds in places where you use them)
  • probably some more checking of passed arguments needs to be done in PrettyTables.jl and they should be corrected if they are e.g. too restrictive

@ronisbr
Copy link
Member

ronisbr commented Sep 26, 2022

Oops, sorry about that! I always assumed that the header is always rendered. However, given the huge rewriting of PrettyTables in v2, I forgot to add some checks. In this case, the number of rendered lines was set to the number of display lines (1), which is lower than the number of header lines (2). This must never happen. I fixed this bug in master and will release a new minor version as soon as the test finishes.

@ronisbr
Copy link
Member

ronisbr commented Sep 26, 2022

@Jollywatt by the way, the output will change. Notice that the header will always be printed, no matter how small the display is.

@bkamins
Copy link
Member

bkamins commented Sep 26, 2022

I think it is OK that the header is always printed.

@ronisbr
Copy link
Member

ronisbr commented Sep 26, 2022

I think it is OK that the header is always printed.

It is the solution that leads to the least amount of problems. Otherwise, the omitted cell summary would be "wrong" since we were also omitting header cells. Anyway, it only affects when you are using a very small terminal, which should be a very corner case.

@ronisbr
Copy link
Member

ronisbr commented Sep 26, 2022

Done! PrettyTables v2.1.1 is released. Can you please test it again?

@bkamins
Copy link
Member

bkamins commented Sep 26, 2022

Testing now (probably docstrings still need to be fixed, but at least we will see if the problem you addressed is fixed)

@bkamins
Copy link
Member

bkamins commented Sep 26, 2022

@Jollywatt - after PrettyTables.jl changes docstrings + testset outputs need to be updated. Thank you!
After this I think all is good to merge.

Jollywatt and others added 9 commits September 29, 2022 18:27
… GroupedDataFrames, not only when `allrows` is false.
- Make grouped data frames fit in small text displays as best as possible
- To break ties, display the first group with one more row than the second
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
(A bug was fixed in PrettyTables which simplified the printing logic a bit.)
Update doctests with grouped data frames.
@Jollywatt
Copy link
Contributor Author

@bkamins I’ve fixed the display logic and tests to work for PrettyTables 2.1.1, and fixed the doctests. Hopefully I haven’t missed anything:) (Apologies for the highly nonlinear git history…!)

@bkamins
Copy link
Member

bkamins commented Sep 29, 2022

Thank you! I need to review the PR before merging from scratch anyway so it is not a problem that history is complex.

@bkamins bkamins merged commit c43c8da into JuliaData:main Sep 29, 2022
@bkamins
Copy link
Member

bkamins commented Sep 29, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants