Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for construction from a named tuple? #1573

Closed
jlperla opened this issue Oct 18, 2018 · 11 comments
Closed

Support for construction from a named tuple? #1573

jlperla opened this issue Oct 18, 2018 · 11 comments

Comments

@jlperla
Copy link

jlperla commented Oct 18, 2018

I think this is related to #1569 but gives a very specific suggestion.

Existing DataFrames seem to play very well with named tuples. FOr example,

julia> df = DataFrame(a = Float64[], b = Int64[])
0×2 DataFrame

julia> push!(df, (a = 4.0, b = 2))
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 14.02

What is irritating is that you cannot create the DataFrame directly from the named tuple, and instead need to specify types yourself.

There is a workaround with splatting, which can rely entirely on inference!

julia> t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> df = DataFrame(;t...)
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 14.02

So my question is: (1) is there any real flaw in the splatting interface
and (2) would you be interested in a PR that makes this more explicit. That is, it would add in a constructor for named tuples and support

julia> t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> df = DataFrame(t) # not valid, but could be
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │
@nalimilan
Copy link
Member

Splatting should be OK. You can also do DataFrame([(a = 4.0, b = 2)]).

Whether or not constructors should allow this also depends on whether we want to allow this for Tables.jl. Cc: @quinnj

@quinnj
Copy link
Member

quinnj commented Oct 18, 2018

I don't think Tables.jl would do something like this; technically, (a = 4.0, b = 2) is a NamedTuple of "iterables", but only because 4.0 and 2 themselves are iterable (though this has been debated a lot and most think scalars shouldn't iterate). So for the time being at least, valid "tables" from Tables.jl perspective include a vector of namedtuples or a namedtuple of abstractvectors, but not plain namedtuples.

@nalimilan
Copy link
Member

Yeah, I don't see a pressing need to change this given that there are at least two convenient ways of doing this currently.

@jlperla
Copy link
Author

jlperla commented Oct 18, 2018

OK, I didn't know that would work. Splatting is pretty tough for new users, but:

SO you are saying that

julia>  t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> using DataFrames

julia> df = DataFrame([t])
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 14.02

seems reasonable enough. Should we make a PR for this in the docs?

@jlperla
Copy link
Author

jlperla commented Oct 18, 2018

Also, it generalizes well if you want to seed a DataFrame from multiple named tuples.

julia>  t = (a = 4.0, b = 2)
(a = 4.0, b = 2)
julia>  t2 = (a = 4.0, b = 3)
(a = 4.0, b = 3)

julia> df = DataFrame([t, t2])
2×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │
│ 2   │ 4.0     │ 3     │
julia> using DataFrames

julia> df = DataFrame([t])
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

We could add something like that to the construction section in http://juliadata.github.io/DataFrames.jl/latest/man/getting_started.html#The-DataFrame-Type-1 ?

@quinnj
Copy link
Member

quinnj commented Oct 19, 2018

That sounds good to me.

@nalimilan
Copy link
Member

Why not. Maybe right before the examples showing push!.

@ChrisRackauckas
Copy link

I kind of think this should be directly supported, to me it's odd that

using DataFrames
DataFrame((a=2,b=[1,2,3]))

fails while

using DataFrames
DataFrame(a=2,b=[1,2,3])

is the standard way to make a DataFrame. An internal method which just splats a namedtuple seems appropriate.

@quinnj
Copy link
Member

quinnj commented Oct 29, 2018

The tricky part of supporting that would be ambiguity with DataFrame(nt) where nt is a NamedTuple of Vectors. For that case, we want it to dispatch to the catchall Tables.jl constructor. Currently it's not easy to write a method signature for the "symdiff" though.

@bkamins
Copy link
Member

bkamins commented Jan 21, 2019

For a reference, please note that we have the following difference in the two approaches:

julia> nt = (a=1.0, b=[1,2,3])
(a = 1.0, b = [1, 2, 3])

julia> DataFrame([nt])
1×2 DataFrame
│ Row │ a       │ b         │
│     │ Float64 │ Array…    │
├─────┼─────────┼───────────┤
│ 1   │ 1.0     │ [1, 2, 3] │

julia> DataFrame(;nt...)
3×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 1.0     │ 1     │
│ 2   │ 1.0     │ 2     │
│ 3   │ 1.0     │ 3     │

@bkamins
Copy link
Member

bkamins commented Aug 26, 2019

I would close this issue for the reasons given in the last two posts. Any objections?

@bkamins bkamins closed this as completed Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants