Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete update for 0.7 and lots of test fixes #1364

Merged
merged 20 commits into from
Mar 2, 2018

Conversation

ExpandingMan
Copy link
Contributor

@ExpandingMan ExpandingMan commented Feb 26, 2018

Hello all, I'm not done with this yet but I wanted to put the PR up now so people know that it's being worked on and I have made significant progress.

Apart from the fixes for 0.7 I have noticed that the tests are quite a mess right now. It is very difficult to get 0.7 up and working without clean tests, so I am prioritizing getting the tests nice and spiffy.

I am fixing a huge number of deprecation warnings (that originate in DataFrames.jl) in the tests. I would like to suggest that if we want to keep tests of deprecated functionality, these should be moved to a dedicated test module because otherwise it makes cleaning up tests that are actually problematic rather difficult. This can also cover up real issues. For instance, does getindex require a makeunique keyword? Suppose you do df[:, [1,1,2]] as in the tests? I'll post an issue about this (if there isn't one already).

I may not get to finish this until the end of the week, but I will post updates soon.

Wow, what timing, looks like there will be significant overlap between here and #1363. Perhaps I'll go back and see if I can branch off of that.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks mostly good, but it can indeed be cleaned/simplified in light of #1364.

@@ -158,7 +158,7 @@ function Base.getindex(x::AbstractIndex, idx::AbstractVector)
end
length(idxs) == 0 && return Int[] # special case of empty idxs
if idxs[1] isa Real
if !all(v -> v isa Integer && !(v isa Bool), idxs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it was breaking when a Symbol was passed in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what changed between 0.6 and 0.7?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, it was breaking in 0.6 also. Since it only produced a warning nobody seems to have noticed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkamins Does that look OK to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was breaking because we do not allow mixing indexing by number and symbol in the same index. It has to be either all numbers or all symbols. We have to exclude Bool because we do not allow indexing by Bool in this case and Bool <: Integer.

Now conversion to Int will fail on Symbol anyway in line 169.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I think I was very confused about what the intent was there. will change.

@@ -46,7 +46,9 @@ function makeidentifier(s::AbstractString)
end
end

return String(take!(res))
# return String(take!(res))
# TODO this is to fix a bug in Compat for 0.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give more details? Is there an issue open in Compat.jl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this issue. I'm hoping it will be resolved one way or another before I'm even done with this PR...

esc(:(isdefined($sym)))
end
using Compat.IOBuffer
const normalize = normalize_string # requires that we don't use LinearAlgebra normalize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather wrap this inside module Unicode, and avoid using Unicode: normalize in the other branch, so that you write Unicode.normalize everywhere to avoid the ambiguity with LinearAlgebra.normalize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do...

const kwpairs = pairs
else
kwpairs(x::AbstractArray) = (first(v) => last(v) for v in x)
macro isdefined(sym)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to contribute this to Compat if you have the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that a while back but it was rejected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, as far as I know @isdefined has been around for quite a while, so I think there's some reason it's not in Compat, though I have no idea what that is. @ararslan , do you think you could find that issue or thread?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can we perhaps keep it in DataFrames though? I'm trying very hard to put only 0.7 valid code into the main body so that we don't really have to do anything once we no longer support 0.6 (which considering the nature of 0.7 may not be very long)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's discuss this in Compat. But here I think you can just do isdefined(Base, :unique!), the fact that Base was omitted is most likely an oversight.

@ExpandingMan
Copy link
Contributor Author

Please don't hesitate to merge #1363 on account of this PR. I plan to incorporate those changes as well...

@@ -99,6 +108,6 @@ include("dataframerow/show.jl")
include("abstractdataframe/sort.jl")
include("dataframe/sort.jl")

include("deprecated.jl")
# include("deprecated.jl")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting this out was done for testing purposes... will be restored, please ignore

@ExpandingMan
Copy link
Contributor Author

Can I remove code for pre 0.6? see here

@ExpandingMan
Copy link
Contributor Author

Ok, this has been way worse than I was expecting, but I'm slowly getting there.

CategoricalArrays is in pretty bad shape for 0.7, looks like there will be a ton of warnings and even a couple of errors that may have to wait for that.

The IOBuffer issue fix is being pushed to compat, see here.

@@ -781,6 +782,11 @@ Base.hcat(df1::AbstractDataFrame, df2::AbstractDataFrame, dfn::AbstractDataFrame
end
end


# TODO this function to be removed after CategoricalArrays update
_promote_col_type(::Type{T}, n::Integer) where T = T(uninitialized, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of update is needed? Just support for the uninitialized syntax?

(FWIW, use {T} and {T<:...} with one-line function syntax.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the semantics consistent, they should support the T(uninitialized, n) constructor. They don't necessarily have to do that, as far as I know Julia isn't necessarily trying to enforce this as a standard. Of course, I think they should support it. It's a design decision that will have to be made at some point, but for now, here's a work-around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought I'm not even sure defining this constructor for CategoricalArray would be consistent. See my response in your PR.

@@ -139,10 +139,11 @@ function latex_char_escape(char::AbstractString)
return string("\\", char)
end
end
# in 0.7 this function will only ever get passed Char, in 0.6 always gets SubString
latex_char_escape(char::Char) = latex_char_escape(string(char))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then better put definitions which are not needed on 0.7 under a version check, even if you need to duplicate code. The goal is that in a few months we have the cleanest code possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. 0.6 only works with SubString and 0.7 only works with Char, so I'll have to think about the best way of doing this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest solution is to have two completely separate blocks, one for each version, and duplicate the code as needed.

@@ -121,6 +121,8 @@ function DataFrame(pairs::Pair{Symbol,<:Any}...; makeunique::Bool=false)::DataFr
DataFrame(columns, Index(colnames, makeunique=makeunique))
end

DataFrame(dict::Dict{Symbol,<:Any}) = DataFrame((x for x ∈ dict)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this was really interesting I forgot to mention it. 0.7 throws a deprecation warning without this explicit constructor because the "automatic conversion" constructors are deprecated, but for the life of me I have no idea what the Dict constructor actually wound up calling before. My first instinct was to do away with the Pair constructor and have it feed into a Dict constructor, but I had to do it the other way around because of makeunique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then Base.convert(::Type{DataFrame}, d::AbstractDict) was called automatically on 0.6. On 0.7, the recommendation added in JuliaLang/julia#23273 is to define constructors, and have convert methods call constructors explicitly (so the inverse of what we do now, mostly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that splatting should generally be avoided when the number of elements can be large, as is the case here: it's much more efficient to pass the collection and loop over it inside the function.)

@@ -158,7 +158,7 @@ function Base.getindex(x::AbstractIndex, idx::AbstractVector)
end
length(idxs) == 0 && return Int[] # special case of empty idxs
if idxs[1] isa Real
if !all(v -> v isa Integer && !(v isa Bool), idxs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what changed between 0.6 and 0.7?

test/cat.jl Outdated
#
# hcat
#

nvint = [1, 2, missing, 4]
nvstr = ["one", "two", missing, "four"]

df2 = DataFrame(Any[nvint, nvstr])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are no longer needed now.

test/show.jl Outdated
@@ -1,12 +1,19 @@
module TestShow
using Compat, Compat.Test, DataFrames

if VERSION ≥ v"0.7.0-"
using Random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Compat.Random. Instead of dataframe_str, I think you can just do $DataFrame inside strings.

test/utils.jl Outdated
@@ -1,5 +1,11 @@
module TestUtils
using Compat, Compat.Test, DataFrames, StatsBase
if VERSION ≤ v"0.7.0-"
using Compat.Random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be under a version check. Also use Compat.@warn.

test/utils.jl Outdated
# @test_throws ArgumentError DataFrames.make_unique([:x, :x, :x_1, :x2], makeunique=false)
@test DataFrames.make_unique([:x, :x_1, :x2], makeunique=false) == [:x, :x_1, :x2]
# TODO below is commented out for now for warning sanity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I don't even remember... will change

test/utils.jl Outdated
@@ -30,7 +37,7 @@ module TestUtils
@test rw == DataFrames.RESERVED_WORDS
end
else
warn("Unable to validate reserved words against parser. ",
@warn("Unable to validate reserved words against parser. ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation.

test/utils.jl Outdated
@@ -150,6 +157,8 @@ module TestUtils

"""
out = String(take!(io))
# TODO this test is very sensitive to irrelevant details of show implementations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can use $Missing and $CategoricalValue to include some variations automatically.

@ExpandingMan
Copy link
Contributor Author

Ok! I have cleaned everything up the best I think is currently possible. Some comments:

  • CategoricalArrays and StatsBase are both still badly screwed up on 0.7. There are a number of 0.7 test errors and deprecation warnings resulting from CategoricalArrays. If I have time I'm going to clean up these packages also.
  • All 0.6 tests now pass without any errors or unexpected warnings. I have moved all tests of deprecated functionality to the new test file "deprecated.jl". I very much would like to keep all tests of deprecated functionality separate, as it becomes very difficult to parse the test results on 0.7 when there are lots of deliberate deprecation warnings thrown in.
  • The IOBuffer bug on Compat has been fixed so I have removed my hack.

Hopefully we can avoid master having DataFrames related test errors in the future? 😉

Tagging @nalimilan , please let me know any changes you'd like me to make so we can get this merged.

@ExpandingMan
Copy link
Contributor Author

Hello all, I'd like to make a general plea that warnings be taken more seriously in the future.

Getting this going on 0.7 was surprisingly painful and that was much exacerbated by the many warnings already appearing in the tests under normal circumstances. Not only to these make updates more difficult, but they can hide real issues, a couple of examples of which can be seen in this PR. This also makes it a lot harder to notice legitimate warnings from Base Julia concerning needed changes and warnings that might be inappropriately given to users. Lastly, the warnings make it more difficult for people to contribute to the package, since it can be hard to figure out what then intent of some of the tests and warnings are (as it sometimes was for me).

Note that I have moved all tests of deprecated functionality which deliberately produce warnings to a separate deprecated.jl test file, which should make it easier to filter out some of the intended warnings.

@@ -121,6 +121,8 @@ function DataFrame(pairs::Pair{Symbol,<:Any}...; makeunique::Bool=false)::DataFr
DataFrame(columns, Index(colnames, makeunique=makeunique))
end

DataFrame(dict::Dict{Symbol,<:Any}) = convert(DataFrame, dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be Dict or AbstractDict?

@@ -158,7 +158,7 @@ function Base.getindex(x::AbstractIndex, idx::AbstractVector)
end
length(idxs) == 0 && return Int[] # special case of empty idxs
if idxs[1] isa Real
if !all(v -> v isa Integer && !(v isa Bool), idxs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was breaking because we do not allow mixing indexing by number and symbol in the same index. It has to be either all numbers or all symbols. We have to exclude Bool because we do not allow indexing by Bool in this case and Bool <: Integer.

Now conversion to Int will fail on Symbol anyway in line 169.

z = vcat(v, x)

# TODO: uncomment the line below after deprecation
# @test_throws ArgumentError z[:, [1, 1, 2]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line (61 and 62) is removed? I think we should test that it throws an error after deprecation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but currently this is not fully deprecated behavior so it does not yet throw an error. When automatic generation of column names from indexing is fully deprecated, this line should be un-commented. Note that we test the current behavior in deprecation.jl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better keep this commented out then.

test/index.jl Outdated
@@ -13,11 +14,11 @@ inds = Any[1,
[big(1)],
big(1):big(1),
[:A],
Union{Bool, Missing}[true, false],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine it does not throw deprecation warning. The vector does not contain any missing and getindex for vector of Union{Bool, Missing} tests for it. And this specific case is not tested in "deprecated.jl".

@bkamins
Copy link
Member

bkamins commented Mar 1, 2018

@ExpandingMan Huge work. Thank you. I have left my comments inline.

@ExpandingMan
Copy link
Contributor Author

Think I've complied with all comments now, so we should be good.

Glad to help! I have a few things that used DataFrames that I was eager to get running on 0.7, so this seemed fairly high priority.

Note that I still get a non-uninitialized constructor warning when I compile DataFrames on 0.7 and have absolutely no idea where it's coming from (annoyingly doesn't give any traceback)!

CategoricalArrays seems like it's in much worse shape. I hope to at least make enough fixes to that so that it doesn't screw everything else up, including DataFrames (currently we still have failing tests on 0.7).

@nalimilan
Copy link
Member

I agree that warnings are problematic in general. I don't like warnings because they indicate something is wrong, yet no failure happens. Ideally we would turn deprecations from DataFrames into errors (but not those from other packages), like Julia does, but I don't think that's possible currently, and tests for deprecated features would not be possible.

test/cat.jl Outdated
@test df[1] == CategoricalVector(1:10)
DataFrames.hcat!(df, 1:10, makeunique=true)
@test df[2] == collect(1:10)
DataFrames.hcat!(df, collect(1:10), makeunique=true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this new line break.

test/cat.jl Outdated
@@ -51,23 +51,27 @@ module TestCat

@testset "hcat ::AbstractVectors" begin
df = DataFrame()
DataFrames.hcat!(df, CategoricalVector{Union{Int, Missing}}(1:10))
dfb = DataFrame(x1=CategoricalVector{Union{Int,Missing}}(1:10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? Same below. Please revert the name changes.

test/data.jl Outdated
using Compat, Compat.Test, DataFrames
importall Base # so that we get warnings for conflicts
using Compat, Compat.Test, DataFrames, Compat.Random
# TODO this already generates a huge wall of warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a note that something needs to be done about this.

test/data.jl Outdated
df3 = DataFrame(Any[[1, 2, missing, 4]])
df4 = DataFrame(Any[Vector{Union{Int, Missing}}(1:4), Vector{Union{Int, Missing}}(1:4)])
df5 = DataFrame(Any[Union{Int, Missing}[1, 2, 3, 4], ["one", "two", missing, "four"]])
df2 = DataFrame(Any[[1, 2, missing, 4], ["one", "two", missing, "four"]], [:x1, :x2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer needed AFAICT. Same below in several places.

z = vcat(v, x)

# TODO: uncomment the line below after deprecation
# @test_throws ArgumentError z[:, [1, 1, 2]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better keep this commented out then.

test/join.jl Outdated
@test join(nameid, jobid, on = :ID, kind = :right) == right[on]
@test join(nameid, jobid, on = :ID, kind = :semi) == semi[on]
@test join(nameid, jobid, on = :ID, kind = :anti) == anti[on]
on_ = [:ID]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid renaming variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All such renames are in order to avoid extraneous warnings. Do you want me to revert to warnings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I said I prefer keeping the code clean and wait until the (spurious) warnings go away than living with weird variable names forever.

test/join.jl Outdated
@@ -71,63 +72,56 @@ module TestJoin
# Cross joins don't take keys
@test_throws ArgumentError join(df1, df2, on = :A, kind = :cross)

@testset "Test empty inputs 1" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but these are the same warnings as when reusing object names, we don't care since we don't want to overwrite the method. Please keep this.

test/utils.jl Outdated
Number Unique: 4
Number Missing: 0
% Missing: 0.000000

"""
out = String(take!(io))
@test (out == missingfirst || out == missingsecond)
# TODO this test is very sensitive to irrelevant details of show implementations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better remove this comment.

test/utils.jl Outdated
@@ -66,7 +67,7 @@ module TestUtils
describe(io, df)
DRT = CategoricalArrays.DefaultRefType
# Julia 0.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing 0.7-specific AFAICT. Same below.

@@ -781,6 +782,11 @@ Base.hcat(df1::AbstractDataFrame, df2::AbstractDataFrame, dfn::AbstractDataFrame
end
end


# TODO this function to be removed after CategoricalArrays update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once JuliaLang/METADATA.jl#13598 is merged, please revert this, require CategoricalArrays 0.3.5, and also adjust uses of the current constructors, e.g. in allocate.

@bkamins
Copy link
Member

bkamins commented Mar 1, 2018

@ExpandingMan Regarding your question about df[:, [1,1,2]] I think that getindex does not need makeunique keyword argument. Such duplicate columns are aliases and I do not see much use for such indexing, but I might not see something so it is not a strong no-go.

@ExpandingMan
Copy link
Contributor Author

Yeah, my confusion has cleared up a bit. It seems that df[:, [1,1,2]] is deprecated, currently it throws a warning, and it will throw an error in the future. There is already a test of the error that should be uncommented when the full deprecation is enacted.

@bkamins
Copy link
Member

bkamins commented Mar 2, 2018

Exactly! Thanks

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it's almost ready after fixing the things I've noted. Just remove all the unnecessary churn due to renaming variables, which should make the PR much more readable.

test/data.jl Outdated
const ≅ = isequal

#test_group("constructors")
df1 = DataFrame(Any[[1, 2, missing, 4], ["one", "two", missing, "four"]], [:Ints, :Strs])
df2 = DataFrame(Any[[1, 2, missing, 4], ["one", "two", missing, "four"]])
df3 = DataFrame(Any[[1, 2, missing, 4]])
df3 = DataFrame(Any[[1, 2, missing, 4]], [:x1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

test/data.jl Outdated
using Compat, Compat.Test, DataFrames
importall Base # so that we get warnings for conflicts
using Compat, Compat.Test, DataFrames, Compat.Random
# TODO: need to find a better way of accomplishing below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving another note since the previous one has been closed for some reason. Please either keep this or remove it, but adding comments doesn't help fixing the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want me to retain the importall or remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say remove it, anyway the comment probably referred to ambiguity warnings, which are no longer printed.

const kwpairs = pairs
else
kwpairs(x::AbstractArray) = (first(v) => last(v) for v in x)
using Compat.IOBuffer
import Base.LinAlg: normalize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I suggested before, what happens if you remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a Compat update so I was really hoping I could do this properly, but it's still really screwed up. When you do using Compat.Unicode: normalize in 0.6 it doesn't import the normalize_str methods. Anyway, this can't be removed, because it throws an error telling you that you must explicitly import LinAlg.normalize. Sorry I've been hesitant to make PR's to Compat, but there are so many gotchas and existing bugs that doing that is rather difficult.

This should not break if Compat gets fixed, especially because Unicode doesn't even export normalize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Doesn't it work if you do using Compat.Unicode and then Unicode.normalize for every use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you should be able to use it soon (don't forget to update REQUIRE): JuliaLang/METADATA.jl#13615

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks, I will wait until METADATA gets updated and try it out.

@@ -522,7 +508,7 @@ module TestDataFrame
end

@testset "misc" begin
df = DataFrame(Any[collect('A':'C')])
df = DataFrame(Any[collect('A':'C')], [:x1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed (same below).

@@ -121,6 +121,8 @@ function DataFrame(pairs::Pair{Symbol,<:Any}...; makeunique::Bool=false)::DataFr
DataFrame(columns, Index(colnames, makeunique=makeunique))
end

DataFrame(dict::AbstractDict) = convert(DataFrame, dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to have missed my previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? Wasn't it regarding splatting? This is no longer splatting. Can you explain what you'd like me to change here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this:

On 0.7, the recommendation added in JuliaLang/julia#23273 is to define constructors, and have convert methods call constructors explicitly (so the inverse of what we do now, mostly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry I had missed that. Will change now.

return "\\textasciitilde{}"
else
return string("\\", char)
if VERSION ≥ v"0.7.0-"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 0.7.0-DEV.4059.

x = DataFrame(a = [1, 2, 3], b = [4, 5, 6])
v = DataFrame(a = [5, 6, 7], b = [8, 9, 10])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, it's in deprecated.jl now. This dataframe v is removed here because it is no longer used in this file (appears again in deprecated.jl).

@test df[1] == CategoricalVector(1:10)
DataFrames.hcat!(df, 1:10, makeunique=true)
@test df[2] == collect(1:10)
DataFrames.hcat!(df, collect(1:10), makeunique=true)
@test df[3] == collect(1:10)

df = DataFrame()
df2 = hcat(CategoricalVector{Union{Int, Missing}}(1:10), df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert all the renames in this file and the following files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, don't worry, I'm not ignoring you, I was busy...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I leave new comments because GitHub hides the old ones and it's hard to keep track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also never know if you are aware that I made changes, though by now it's seeming that you must get a notification whenever I commit...

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I send approve not to block merging when @nalimilan is OK with the PR (but I will try to review the final version of changes again if I will have time).

test/grouping.jl Outdated
df = DataFrame(Key1 = ["A", missing, "B", "B", "A"],
Key2 = CategoricalArray(["B", "A", "A", missing, "A"]),
Value = 1:5)
df= DataFrame(Key1 = ["A", missing, "B", "B", "A"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

test/show.jl Outdated
│ Col # │ Name │ Eltype │ Missing │ Values │
├───────┼──────┼──────────────────────────────────┼─────────┼─────────────────┤
│ 1 │ Fish │ String │ 0 │ Suzy … Amir │
│ 2 │ Mass │ Union{Float64, Missings.Missing} │ 1 │ 1.5 … missing │"""
│ 2 │ Mass │ Union{Float64, $Missing} │ 1 │ 1.5 … missing │"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized this probably won't work on 0.7, where the Missing prefix won't be present, as the other columns will be less wide than in the string. So I guess we'll need two blocks with a VERSION check. There's another case below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@ExpandingMan
Copy link
Contributor Author

Thanks for pushing that last commit. Sorry, as this wraps up I'm starting to lose track of things.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, look good now!

@ExpandingMan
Copy link
Contributor Author

Yay! In the future if I ever do something this big I'll probably split it into multiple PR's. I suspect that might save us all a bit of sanity.

@nalimilan
Copy link
Member

nalimilan commented Mar 2, 2018

Hmm, I wonder why some lines are not covered by tests anymore. For example, compare:
https://coveralls.io/builds/15784016/source?filename=src%2Fdataframe%2Fdataframe.jl#L403
with the last commit on master:
https://coveralls.io/builds/15401106/source?filename=src%2Fdataframe%2Fdataframe.jl#L403

EDIT: scratch that, the links I posted actually show that the regression is older.
@bkamins Any idea which PR could have introduced this regression in coverage? I couldn't find it, but on 0.11.5 that line and others were covered: https://coveralls.io/builds/15131504/source?filename=src%2Fdataframe%2Fdataframe.jl#L403

EDIT: filed #1372

@ExpandingMan
Copy link
Contributor Author

That was added for a reason, though now I don't remember exactly what that was. I expect this to break something.

@nalimilan
Copy link
Member

Yeah, there's a deprecation warning about it, but let's handle this in the next round of fixes. Else we need more tests, which is going to hold off the PR.

@ExpandingMan
Copy link
Contributor Author

Oh! I remember what it fixes.

Without it you can't do e.g. df[1:end, 1].

@nalimilan nalimilan merged commit c09b880 into JuliaData:master Mar 2, 2018
@nalimilan
Copy link
Member

Let's handle this in another PR, anyway we're not there yet!

@ExpandingMan
Copy link
Contributor Author

Acknowledged, opened an issue so we don't forget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants