-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PrettyTables.jl as an alternative backend for display in DataFrames.jl #2337
Comments
Just to report the progress, I am trying to, at least, print the common structures just like DataFrames is currently printing. Right now, I am having a problem with undefined references when accessing DataFrames using https://discourse.julialang.org/t/help-to-check-if-an-element-in-a-tables-jl-is-defined/44110 After that, I think I just need to add option to adjust the vertical cropping. AFAIK, this would be the last feature I really need to implement. The rest will be only configurations and tweaks using existing interface (I hope!). |
I have commented on the issue there and for completeness I repost here:
|
I don't really see why data frames needs to implement printing at all. Unlike joins etc. where the storage of the columns matters, i think all of this could be relegated to a Tables.jl-type package. Would be great to excise this huge chunk of code! |
This is the ultimate objective and this package is PrettyTables.jl. The only problem is that currently DataFrames.jl has some functionality that PrettyTables.jl has to catch up with (but it seems to be very close towards this goal). Then we will have a choice:
I would prefer to do 2, but there is a risk that user's code depends on how printing is done currently and it would be very breaking. So probably the way we will approach it will be in three stages:
Anyway - we will have time to discuss this when @ronisbr tells us that he is ready for the stress tests of PrettyTables.jl against:
|
If I can provide an opinion, this is the perfect approach for me! I have been studying the printing system of DataFrames and it is very complex. There are a lot of cases that we can break things easily in the first attempt to move to PrettyTables.jl :)
I am getting there! I found that problem with |
Thanks for the update to make prettytables performant with tables! Thats a lot of progress right there as well. |
@ronisbr - thank you for all your work on this! I will open an issue for |
Ok! Now we can start to do some tests and I need help to tweak everything. I propose to start with the Text backend before trying to make HTML and LaTeX work. Using function h1_f(data,i,j)
try
return ismissing(data[i,j]) ||
data[i,j] == nothing ||
typeof(data[i,j]) <: Union{AbstractDataFrame, GroupedDataFrame,
DataFrames.DataFrameRow,
DataFrames.DataFrameRows,
DataFrames.DataFrameColumns}
catch e
if isa(e, UndefRefError)
return true
else
rethrow(e)
end
end
end
h1 = Highlighter(h1_f, crayon"dark_gray")
function f1(v,i,j)
if typeof(v) <: Union{AbstractDataFrame, GroupedDataFrame,
DataFrames.DataFrameRow, DataFrames.DataFrameRows,
DataFrames.DataFrameColumns}
str = sprint(print, v, context = :compact => true)
str = split(str, '\n')[1]
return str
elseif ismissing(v)
return "missing"
elseif v == nothing
return ""
else
return v
end
end
function print_dataframe(df)
sch = Tables.schema(df)
names = reshape( [sch.names...], (1,:) )
types = DataFrames.compacttype.(reshape( [sch.types...], (1,:) ))
pretty_table(df,
vcat(names,types),
alignment = :l,
formatters = (f1,),
highlighters = (h1,),
maximum_columns_width = 30,
row_number_alignment = :l,
show_row_number = true,
tf = dataframe)
end I am managed to get a lot of printing like the default output of DataFrames (I am already limiting the column width size to 30). For example: julia> df = DataFrame(A=Int64.(1:9), B = Vector{Any}(undef, 9));
julia> df.B[1:8] = [df, # DataFrame
df[1,:], # DataFrameRow
view(df,1:1, :), # SubDataFrame
eachrow(df), # DataFrameColumns
eachcol(df), # DataFrameRows
groupby(df, :A),missing,nothing] # GroupedDataFrame; julia> df = DataFrame(A = Int64[1:4;], B = ["x\"", "∀ε>0: x+ε>x", "z\$", "A\nC"],
C = Float32[1.0, 2.0, 3.0, 4.0]); Hence, I think this is a good baseline to start. Now, we need to tweak the edge cases and, for this, I need help because I do not know what they are :D |
OK - I will try to list the cases. |
I think apart from what you covered this would be good to check:
|
|
Good! For Markdown, it will not work now. We need to wait @NicholasWMRitchie update his PR because some conflicts. |
@bkamins for your last example, I am getting this with DataFrames: Is this expected? |
Yes, and this is incorrect - that is why I have added this example 😄 - this will be fixed when we add a limit of 32 characters to a column width (pending as you know as we are waiting for Markdown PR to finalize). If you comment out |
If you had a wider terminal you would see:
|
hum, I see that the |
I think the reason is that DataFrames.jl used
and the only thing that was changed is removing Also - just to make sure nothing is broken you might want to check Can you also show how you display the different element types in the header (and is there a possibility to opt-out of it)? |
OK!
I think it is working fine:
I am using exactly the same function of DataFrames ( function print_dataframe(df; eltypes::Bool = true)
sch = Tables.schema(df)
names = reshape( [sch.names...], (1,:) )
types = DataFrames.compacttype.(reshape( [sch.types...], (1,:) ))
pretty_table(df,
vcat(names,types),
alignment = :l,
formatters = (f1,),
highlighters = (h1,),
maximum_columns_width = 30,
nosubheader = !eltypes,
row_number_alignment = :l,
show_row_number = true,
tf = dataframe)
end Now I will check the filters. |
OK - I meant |
Just for my reference - is there some reason why you use |
@ronisbr - I think we can use PrettyTables.jl for text/plain for now. LaTeX and HTML are a less of priority as I think what we have now for them in DataFrames.jl is pretty OK. |
Good! I am doing some improvements in Markdown, since it will be available in DataFrames soon. One major accomplished I had today was the capability to render Markdown using the Text backend inside a cell. Thus, we can do things like this: julia> a = md"""
# Markdown
This is a **Markdown** example.
!!! note
This is a note.
This is a URL: [PrettyTables.jl](https://github.com/ronisbr/PrettyTables.jl)
""";
julia> b = md"""
This _**is a bold text that will wrap inside a cell**_
""";
julia> data = [1 a; 2 b];
julia> pretty_table(data, linebreaks = true, hlines = :all, columns_width = [-1,40]) I just need to test those things because the logic was somewhat complicated. |
Fantastic - Markdown (simplified) is already available on DataFrames.jl master, but we do not have to be 100% consistent here with the display, as your approach seems better. |
Sorry! I completely missed this question. Well, when I was creating PrettyTables, it seems better to use I think we are ready to do some tests! Is it possible to verify the previous code against |
OK - I will check it. Just please let me know how I should preferably change settings.
If you agree with this plant then the simplest would be if you did a simple PR to DataFrames.jl with what I write implemented (of course without spending too much time on polishing it - e.g. no documentation updates are needed, or not all kwargs of In particular - one of the major things that we wanted to change in DataFrames.jl and should be easy in PrettyTables.jl by default is to remove the column separator lines in display as they occupy to much horizontal space. This would be a huge win. |
Ok!
Ok!
Ok!
Ok, but I will put this on hold. When everything is decided and good, then I write this section of the manual.
Nice! I will create this PR with those functionalities you described. Indeed, it will be easier to test.
You mean, something like this: or like this: |
That is my point - to have something minimal now, so that we can test it and make it production ready later Regarding output I imagined Also in DataFrames.jl we show row numbers by default to the left of the data in a data frame - this is something to be discussed if we want it or not (there were mixed opinions on this). I am OK to drop it. |
Good!
I, personally, prefer to display the row numbers. It can be switched by |
+1 on keeping row numbers. It helps when you are sharing a screen with someone or explaining a table. |
Yes - it would be good to have a vline there |
@ronisbr - there is one crucial thing we have just discussed with @nalimilan. Actually if the tests go well, we think that dropping the legacy output that DataFrames.jl uses now could be even considered. The crucial thing is that as DataFrames.jl is a dependency in dozens of packages then PrettyTables.jl will become such a dependency. In general of course it is good as this is what I guess you want 😄, but it also means that the moment we make this dependency we will pin some version of PrettyTables.jl in In short the question is - how close do you feel you are to 1.0 release of PrettyTables.jl with some guarantees of not having breaking changes often. |
This is actually one thing I was about to tell you. For the text backend, I think I am very, very happy with the API right now. I just need to add the last features to make it fully compatible with the option in DataFrames. That's why I need this test from DataFrames, to see if I missed something. If you don't mind to make the transition in parts, text backend first, and after HTML and LaTeX, then I think I can tag v1.0 as soon as we know that everything we need is implemented. Inside PrettyTables, I will mark LaTeX and HTML backends as beta with the goal to stabilize them in v2.0. In the past, I took sometime and released a very breaking version to make all the API more or less uniform. However, if you think that there is some kind of improvement I need to do before 1.0, please, let me know! |
This is what we have hoped 😄. So let us run the tests now based on your PR and we will synchronize the releases of PrettyTables.jl and DataFrames.jl (and keep HTML/LaTeX for later) |
Perfect! I will submit this PR very soon. |
@ronisbr - I have marked all open issues in DataFrames.jl that relate to display topics with "display" label so that you can easily filter out what has been discussed in the past. |
Done! |
It would be nice to add PrettyTables.jl as an alternative display backend for DataFrames.jl to the one we currently have (probably not to replace it in the short term, as it would be too hard to ensure a smooth transition).
If we want to do it the steps would be the following:
DataFrameRow
,GroupedDataFrame
,DataFrameRows
,DataFrameColumns
are correctly handled)CC @ronisbr
The text was updated successfully, but these errors were encountered: