Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs following CSV.jl 0.9 release #2865

Merged
merged 9 commits into from
Sep 9, 2021
Merged

update docs following CSV.jl 0.9 release #2865

merged 9 commits into from
Sep 9, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Sep 8, 2021

Fixes #2864

@bkamins bkamins added the doc label Sep 8, 2021
@bkamins bkamins added this to the patch milestone Sep 8, 2021
@bkamins
Copy link
Member Author

bkamins commented Sep 8, 2021

waiting for a decision in JuliaData/WeakRefStrings.jl#85

@bkamins
Copy link
Member Author

bkamins commented Sep 8, 2021

@nalimilan - I have updated the docs. I left String[X] types and added a paragraph explaining how they work.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restarted the jobs now that WeakRefStrings has released

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better now!

@nalimilan
Copy link
Member

Though doctests fail due to that String15… weirdness:

│ Evaluated output:
│ 
│ 3×5 DataFrame
│  Row │ Species          SepalLength_mean  SepalWidth_mean  PetalLength_mean  P 
│      │ String15        Float64           Float64          Float64           F 
│ ─────┼──────────────────────────────────────────────────────────────────────────
│    1 │ Iris-setosa                 5.006            3.418             1.464    2 │ Iris-versicolor             5.936            2.77              4.263 │ Iris-virginica              6.588            2.974             5.5521 column omitted
│ 
│ Expected output:
│ 
│ 3×5 DataFrame
│  Row │ Species          SepalLength_mean  SepalWidth_mean  PetalLength_mean  P 
│      │ String15         Float64           Float64          Float64           F 
│ ─────┼──────────────────────────────────────────────────────────────────────────
│    1 │ Iris-setosa                 5.006            3.418             1.464    2 │ Iris-versicolor             5.936            2.77              4.263 │ Iris-virginica              6.588            2.974             5.5521 column omitted

Cc: @ronisbr

@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2021

Though doctests fail due to that String15… weirdness:

I will check what is the reason of failure and fix it - it is most likely on DataFrames.jl side. The strange thing is that what I put into this PR passes on my local machine.

@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2021

The problem was the following:

julia> using DataFrames

julia> import Dates

julia> DataFrame(d=Dates.Date(1))
1×1 DataFrame
 Row │ d
     │ Date…
─────┼────────────
   1 │ 0001-01-01

julia> using Dates

julia> DataFrame(d=Dates.Date(1))
1×1 DataFrame
 Row │ d
     │ Date
─────┼────────────
   1 │ 0001-01-01

Now I have fixed it to never try printing type module name as a prefix (so the latter output is presented consistently).

In this way data frame is displayed in the same way no matter if the module is loaded with using or with import or only indirectly by some other module.

@@ -106,12 +106,12 @@ function compacttype(T::Type, maxwidth::Int=8)
T === Any && return "Any"
T === Missing && return "Missing"

sT = string(T)
sT = string(T isa Union ? T : nameof(T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For full consistency we should recursively call nameof on each part of the union, as otherwise the module name is printed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is even more complicated 😞. I will try to propose something reasonable. The problem is that nameof drops parameters from parametric types.

@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2021

The patches lead me to uncover that we printed MD type incorrectly previously which got uncaught and that we inconsistently printed type information in text vs HTML/LaTeX. Now I made it more consistent (use 9 text width characters by default - @ronisbr - this might affect your work on HTML and LaTeX output, but these are minor changes)

@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2021

Just as a comment why 9 if someone would ask. The String[X] types have up to 9 characters of length (and also this is the number we used in text/plain already, I just synced HTML and LaTeX with this reference)

@bkamins bkamins merged commit 39c04ac into main Sep 9, 2021
@bkamins bkamins deleted the bk/docs_csv branch September 9, 2021 12:53
@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2021

Thank you!

test/show.jl Outdated
@@ -301,8 +301,8 @@ end
@test sprint(show, df) == """
1×3 DataFrame
Row │ a b c
│ Date DateTime Day
─────┼────────────────────────────────────────
│ Date DateTime Dates.Day
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why don't we print Dates.DateTime?

Anyway it would make sense to never print the module name to avoid this kind of variation (which is hard to understand for users).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why don't we print Dates.DateTime?

Because we match type-width only to column name, not shown contents (which can vary). I guess @ronisbr could change this if we asked for this. E.g.:

julia> DataFrame(x=Dates.DateTime(1))
1×1 DataFrame
 Row │ x
     │ DateTime…
─────┼─────────────────────
   1 │ 0001-01-01T00:00:00

julia> DataFrame(xxxxxxxxxxxxxxxx=Dates.DateTime(1))
1×1 DataFrame
 Row │ xxxxxxxxxxxxxxxx
     │ Dates.DateTime
─────┼─────────────────────
   1 │ 0001-01-01T00:00:00

Anyway it would make sense to never print the module name to avoid this kind of variation (which is hard to understand for users).

We could go for finding the last dot (i.e. .) where the sequence XXX.YYYY.ZZZZ consists of valid Julia identifiers and strip the front of the string, but it is quite tricky to do in 100% correct way (of course it is doable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's doable, I'd say that taking into account the final size width of the column (including contents) would be better. Otherwise there's some wasted space which in some cases could be used to print relevant information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronisbr - is this EASILY doable? (if it is hard it is probably better to focus on HTML/LaTeX support first). Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update docs to CSV.jl 0.9
3 participants