Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in push!ing an empty row into a DataFrame #2953

Closed
ericphanson opened this issue Nov 30, 2021 · 2 comments
Closed

Inconsistency in push!ing an empty row into a DataFrame #2953

ericphanson opened this issue Nov 30, 2021 · 2 comments
Labels
Milestone

Comments

@ericphanson
Copy link
Contributor

julia> df = DataFrame()
0×0 DataFrame

julia> push!(df, NamedTuple(); cols=:union) # does not add a row
0×0 DataFrame

julia> df = DataFrame(:a => [1]) # try again with non-empty df
1×1 DataFrame
 Row │ a     
     │ Int64 
─────┼───────
   11

julia> push!(df, NamedTuple(); cols=:union) # adds a row
2×1 DataFrame
 Row │ a       
     │ Int64?  
─────┼─────────
   11
   2missing 

This edge case came up for me when doing

new_df = DataFrame()
for value in old_df.col
    push!(new_df, nt_with_variable_number_cols(value); cols=:union)
end
df = hcat(old_df, new_df; copycols=false)

to try to work around the fact that

transform!(df, :col => ByRow(nt_with_variable_number_cols) => AsTable)

does not work (says you must emit identical columns).

@bkamins bkamins added this to the 1.x milestone Nov 30, 2021
@bkamins
Copy link
Member

bkamins commented Nov 30, 2021

I can see the problem. The question is what to do about it. If the data frame has no columns then pushing NamedTuple() cannot add a row to it.

On the other hand, if a data frame already has some data and you push NamedTuple() to it with cols=:union I think that it is natural to expect that a row full of missing values will be created.

In general Tables.dictrowtable is a function designed to perform such unioning as you want:

julia> Tables.dictrowtable([NamedTuple(), (a=1,), NamedTuple(), (b=2,), NamedTuple()]) |> DataFrame
5×2 DataFrame
 Row │ a        b
     │ Int64?   Int64?
─────┼──────────────────
   1 │ missing  missing
   2 │       1  missing
   3 │ missing  missing
   4 │ missing        2
   5 │ missing  missing

@bkamins
Copy link
Member

bkamins commented Oct 15, 2022

I think it should be closed.

@ericphanson - if you think otherwise - please comment.

@bkamins bkamins closed this as completed Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants