Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check if input is a table before trying alternative constructors #2341

Merged
merged 4 commits into from
Aug 2, 2020

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Aug 1, 2020

Fixes JuliaData/CSV.jl#702. A DataFrame
constructor was removed from CSV.jl in its latest release (0.7.6) as
part of new deprecations for decoupling the two packages. What I forgot
was that we were relying on that constructor because of an ambiguity in
the generic DataFrame constructor fallback that mostly uses Tables.jl to
try and turn anything into a DataFrame. The problem is "mostly"; one of
the checks is if the input is an AbstractVector of AbstractVectors,
which it turns out CSV.File is! This is because it's considered an
AbstractVector of Rows, which themselves are <: AbstractVector. So
we were trying to basically take each row of a CSV.File and treat each
row as a column in that fallback constructor.

While it was proposed to put the DataFrame constructor back in CSV.jl
for now; that's pretty unsatisfying because we know we're going to
remove the DataFrames deprecation at some point, and when we do, we
want DataFrame(CSV.File()) to work out of the box.

The fix here is just adding an additional check up front in the fallback
constructor if we already know the input is Tables.istable(x). If so,
we'll avoid these extra corner case checks and just let the Tables.jl
machinery do the work.

Fixes JuliaData/CSV.jl#702. A DataFrame
constructor was removed from CSV.jl in its latest release (0.7.6) as
part of new deprecations for decoupling the two packages. What I forgot
was that we were relying on that constructor because of an ambiguity in
the generic DataFrame constructor fallback that mostly uses Tables.jl to
try and turn anything into a DataFrame. The problem is "mostly"; one of
the checks is if the input is an `AbstractVector` of `AbstractVectors`,
which it turns out `CSV.File` is! This is because it's considered an
`AbstractVector` of `Row`s, which themselves are `<: AbstractVector`. So
we were trying to basically take each row of a `CSV.File` and treat each
row as a column in that fallback constructor.

While it was proposed to put the DataFrame constructor back in CSV.jl
for now; that's pretty unsatisfying because we know we're going to
remove the DataFrames deprecation at some point, and when we do, we
want `DataFrame(CSV.File())` to work out of the box.

The fix here is just adding an additional check up front in the fallback
constructor if we already know the input is `Tables.istable(x)`. If so,
we'll avoid these extra corner case checks and just let the Tables.jl
machinery do the work.
@bkamins
Copy link
Member

bkamins commented Aug 1, 2020

Thank you for the fix. Can you please add a test of this?

I have no problem with having CSV.jl as a dependency in extras for tests in particular and we have docs/src/assets/iris.csv data set to use for tests.

In this way we could ensure that a standard test set for DataFrames.jl checks if CSV.jl works correctly with DataFrames.jl (ideally - put as many tests as needed to cover all scenarios how things can be passed between the packages).

Would this be OK with you? Thank you!

@quinnj
Copy link
Member Author

quinnj commented Aug 1, 2020

Quick correction:

This is because it's considered an AbstractVector of Rows, which themselves are <: AbstractVector.

This actually isn't accurate (my brain must have figured it out while I was sleeping, because I woke up thinking, "wait! this shouldn't be an issue!"). The real issue is that all([]) === true, so our all check if each element of CSV.File isa AbstractVector returns true, even though the problem is it was just empty.

I still like the fix in place here, but we may consider wanting to add an additional check that the input also has a length > 0.

@quinnj
Copy link
Member Author

quinnj commented Aug 1, 2020

Ok, added a test; I didn't add CSV.jl as a test dependency for now as this is a smaller issue than I originally thought (only affects empty CSV.File which is rare).

test/tables.jl Show resolved Hide resolved
test/tables.jl Outdated Show resolved Hide resolved
test/tables.jl Outdated Show resolved Hide resolved
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some more tests now and it looks good. Thank you. I will merge and backport it when CI passes.

@bkamins bkamins merged commit 0d1588a into master Aug 2, 2020
@bkamins bkamins deleted the jq/fixtables branch August 2, 2020 07:21
@bkamins
Copy link
Member

bkamins commented Aug 2, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame(CSV.File(...)) broken with header-only
2 participants