Support for construction from a named tuple? #1573

jlperla · 2018-10-18T18:47:15Z

I think this is related to #1569 but gives a very specific suggestion.

Existing DataFrames seem to play very well with named tuples. FOr example,

julia> df = DataFrame(a = Float64[], b = Int64[])
0×2 DataFrame

julia> push!(df, (a = 4.0, b = 2))
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

What is irritating is that you cannot create the DataFrame directly from the named tuple, and instead need to specify types yourself.

There is a workaround with splatting, which can rely entirely on inference!

julia> t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> df = DataFrame(;t...)
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

So my question is: (1) is there any real flaw in the splatting interface
and (2) would you be interested in a PR that makes this more explicit. That is, it would add in a constructor for named tuples and support

julia> t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> df = DataFrame(t) # not valid, but could be
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

The text was updated successfully, but these errors were encountered:

nalimilan · 2018-10-18T19:50:05Z

Splatting should be OK. You can also do DataFrame([(a = 4.0, b = 2)]).

Whether or not constructors should allow this also depends on whether we want to allow this for Tables.jl. Cc: @quinnj

quinnj · 2018-10-18T20:16:19Z

I don't think Tables.jl would do something like this; technically, (a = 4.0, b = 2) is a NamedTuple of "iterables", but only because 4.0 and 2 themselves are iterable (though this has been debated a lot and most think scalars shouldn't iterate). So for the time being at least, valid "tables" from Tables.jl perspective include a vector of namedtuples or a namedtuple of abstractvectors, but not plain namedtuples.

nalimilan · 2018-10-18T20:27:26Z

Yeah, I don't see a pressing need to change this given that there are at least two convenient ways of doing this currently.

jlperla · 2018-10-18T20:28:33Z

OK, I didn't know that would work. Splatting is pretty tough for new users, but:

SO you are saying that

julia>  t = (a = 4.0, b = 2)
(a = 4.0, b = 2)

julia> using DataFrames

julia> df = DataFrame([t])
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

seems reasonable enough. Should we make a PR for this in the docs?

jlperla · 2018-10-18T20:31:00Z

Also, it generalizes well if you want to seed a DataFrame from multiple named tuples.

julia>  t = (a = 4.0, b = 2)
(a = 4.0, b = 2)
julia>  t2 = (a = 4.0, b = 3)
(a = 4.0, b = 3)

julia> df = DataFrame([t, t2])
2×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │
│ 2   │ 4.0     │ 3     │
julia> using DataFrames

julia> df = DataFrame([t])
1×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 4.0     │ 2     │

We could add something like that to the construction section in http://juliadata.github.io/DataFrames.jl/latest/man/getting_started.html#The-DataFrame-Type-1 ?

quinnj · 2018-10-19T03:08:18Z

That sounds good to me.

nalimilan · 2018-10-19T07:35:18Z

Why not. Maybe right before the examples showing push!.

ChrisRackauckas · 2018-10-29T15:00:14Z

I kind of think this should be directly supported, to me it's odd that

using DataFrames
DataFrame((a=2,b=[1,2,3]))

fails while

using DataFrames
DataFrame(a=2,b=[1,2,3])

is the standard way to make a DataFrame. An internal method which just splats a namedtuple seems appropriate.

quinnj · 2018-10-29T15:36:36Z

The tricky part of supporting that would be ambiguity with DataFrame(nt) where nt is a NamedTuple of Vectors. For that case, we want it to dispatch to the catchall Tables.jl constructor. Currently it's not easy to write a method signature for the "symdiff" though.

bkamins · 2019-01-21T09:34:12Z

For a reference, please note that we have the following difference in the two approaches:

julia> nt = (a=1.0, b=[1,2,3])
(a = 1.0, b = [1, 2, 3])

julia> DataFrame([nt])
1×2 DataFrame
│ Row │ a       │ b         │
│     │ Float64 │ Array…    │
├─────┼─────────┼───────────┤
│ 1   │ 1.0     │ [1, 2, 3] │

julia> DataFrame(;nt...)
3×2 DataFrame
│ Row │ a       │ b     │
│     │ Float64 │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 1.0     │ 1     │
│ 2   │ 1.0     │ 2     │
│ 3   │ 1.0     │ 3     │

bkamins · 2019-08-26T07:20:22Z

I would close this issue for the reasons given in the last two posts. Any objections?

ajozefiak mentioned this issue Oct 30, 2018

Construction from a named tuple documentation #1580

Closed

bkamins mentioned this issue Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

bkamins closed this as completed Sep 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for construction from a named tuple? #1573

Support for construction from a named tuple? #1573

jlperla commented Oct 18, 2018

nalimilan commented Oct 18, 2018

quinnj commented Oct 18, 2018

nalimilan commented Oct 18, 2018

jlperla commented Oct 18, 2018 •

edited

Loading

jlperla commented Oct 18, 2018

quinnj commented Oct 19, 2018

nalimilan commented Oct 19, 2018

ChrisRackauckas commented Oct 29, 2018

quinnj commented Oct 29, 2018

bkamins commented Jan 21, 2019

bkamins commented Aug 26, 2019

Support for construction from a named tuple? #1573

Support for construction from a named tuple? #1573

Comments

jlperla commented Oct 18, 2018

nalimilan commented Oct 18, 2018

quinnj commented Oct 18, 2018

nalimilan commented Oct 18, 2018

jlperla commented Oct 18, 2018 • edited Loading

jlperla commented Oct 18, 2018

quinnj commented Oct 19, 2018

nalimilan commented Oct 19, 2018

ChrisRackauckas commented Oct 29, 2018

quinnj commented Oct 29, 2018

bkamins commented Jan 21, 2019

bkamins commented Aug 26, 2019

jlperla commented Oct 18, 2018 •

edited

Loading