-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pushing a DataFrameRow to a DataFrame #1439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I just tried this yesterday and was surprised it didn't work. Glad it's an easy fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append!
doesn't sound like the right function. DataFrameRow
is a row, not a collection of rows, so we should use push!
instead. We already have two methods for that.
Using push sounds fine to me. I originally didn't thing of push here as I tend to think of push only being applied to linear data structures. |
src/dataframe/dataframe.jl
Outdated
@@ -994,6 +994,8 @@ Base.convert(::Type{DataFrame}, d::AbstractDict) = DataFrame(d) | |||
## | |||
############################################################################## | |||
|
|||
Base.push!(df::DataFrame, r::DataFrameRow) = append!(df, parent(r)[row(r),:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this implementation isn't really efficient, which can be a problem if you repeatedly push rows in a loop. I think we'd better allow iterating over a DataFrameRow
, which is consistent with NamedTuple
. Then the already existing method below should work automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrameRow
s are already iterable but are incompatible with the push!
for iterables defined below:
julia> df = DataFrame(A = 1:2, B = 'a':'b')
2×2 DataFrames.DataFrame
│ Row │ A │ B │
├─────┼───┼─────┤
│ 1 │ 1 │ 'a' │
│ 2 │ 2 │ 'b' │
julia> r = DataFrameRow(df, 2)
DataFrameRow (row 2)
A 2
B b
julia> collect(r)
2-element Array{Tuple{Symbol,Any},1}:
(:A, 2)
(:B, 'b')
julia> push!(df, collect(r))
ERROR: ArgumentError: Error adding (:A, 2) to column :A. Possible type mis-match.
Stacktrace:
[1] push!(::DataFrames.DataFrame, ::Array{Tuple{Symbol,Any},1}) at /home/omus/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:1034
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I think we should change this, so that DataFrameRow
and NamedTuple
can be used interchangeably (and the former is just a mutable version of the latter). It will be breaking, but hopefully not many people depend on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we cannot change it without a deprecation period (#1449). So in the meantime if you want, add a temporary push!
method in deprecated.jl to make this work until the deprecation is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This completely fell off my radar. I'll try to take care of it now.
abae9b0
to
9e195a9
Compare
I've rebased the code which required some minor changes which cleaned things up. Once we remove the |
2117671
to
efb29b2
Compare
Thanks. I think there should also be tests for situations where you append a row from a different data frame, with different cases: column names match, column names are the same but in a different order, and the number of columns differs. AFAICT we should reorder values in the second situation (consistent with EDIT: this means we'll have to keep a special |
I like those changes but they seem outside the scope of this PR. Are you okay with this PR if only the method is moved into dataframe.jl? |
No, that's really the same PR. With the current method, columns will be added only depending on their order, which is inconsistent with what we do in |
Given the recent changes in `DataFrames.jl I think the rules should be:
Do we agree on the approach? |
Can we close it and instead work in #1685? |
No description provided.