Support pushing a DataFrameRow to a DataFrame #1439

omus · 2018-07-05T22:33:26Z

No description provided.

ararslan

Funny, I just tried this yesterday and was surprised it didn't work. Glad it's an easy fix.

nalimilan

append! doesn't sound like the right function. DataFrameRow is a row, not a collection of rows, so we should use push! instead. We already have two methods for that.

omus · 2018-07-09T13:37:25Z

Using push sounds fine to me. I originally didn't thing of push here as I tend to think of push only being applied to linear data structures.

nalimilan · 2018-07-09T13:52:17Z

src/dataframe/dataframe.jl

@@ -994,6 +994,8 @@ Base.convert(::Type{DataFrame}, d::AbstractDict) = DataFrame(d)
 ##
 ##############################################################################

+Base.push!(df::DataFrame, r::DataFrameRow) = append!(df, parent(r)[row(r),:])


Unfortunately this implementation isn't really efficient, which can be a problem if you repeatedly push rows in a loop. I think we'd better allow iterating over a DataFrameRow, which is consistent with NamedTuple. Then the already existing method below should work automatically.

DataFrameRows are already iterable but are incompatible with the push! for iterables defined below:

julia> df = DataFrame(A = 1:2, B = 'a':'b') 2×2 DataFrames.DataFrame │ Row │ A │ B │ ├─────┼───┼─────┤ │ 1 │ 1 │ 'a' │ │ 2 │ 2 │ 'b' │ julia> r = DataFrameRow(df, 2) DataFrameRow (row 2) A 2 B b julia> collect(r) 2-element Array{Tuple{Symbol,Any},1}: (:A, 2) (:B, 'b') julia> push!(df, collect(r)) ERROR: ArgumentError: Error adding (:A, 2) to column :A. Possible type mis-match. Stacktrace: [1] push!(::DataFrames.DataFrame, ::Array{Tuple{Symbol,Any},1}) at /home/omus/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:1034

Ah, right. I think we should change this, so that DataFrameRow and NamedTuple can be used interchangeably (and the former is just a mutable version of the latter). It will be breaking, but hopefully not many people depend on it.

Actually we cannot change it without a deprecation period (#1449). So in the meantime if you want, add a temporary push! method in deprecated.jl to make this work until the deprecation is removed.

This completely fell off my radar. I'll try to take care of it now.

omus · 2018-09-26T16:22:07Z

I've rebased the code which required some minor changes which cleaned things up. Once we remove the collect(::DataFrameRow) deprecation everything should just work. The first commit places the new push! method in dataframe.jl while the second moves it into deprecated.jl. Feel free to squash or drop the last commit depending on what placement you prefer.

nalimilan · 2018-09-27T09:44:43Z

Thanks. I think there should also be tests for situations where you append a row from a different data frame, with different cases: column names match, column names are the same but in a different order, and the number of columns differs. AFAICT we should reorder values in the second situation (consistent with vcat, cf. #1347), and throw an error in the last one. For the case where the data frame is the same, having a fast path to avoid these expensive checks would be nice.

EDIT: this means we'll have to keep a special DataFrameRow method even after the deprecation is removed, so it should live in dataframe.jl.

omus · 2018-10-09T16:42:29Z

I like those changes but they seem outside the scope of this PR. Are you okay with this PR if only the method is moved into dataframe.jl?

nalimilan · 2018-10-09T19:06:12Z

No, that's really the same PR. With the current method, columns will be added only depending on their order, which is inconsistent with what we do in vcat.

bkamins · 2019-01-13T15:44:00Z

Given the recent changes in `DataFrames.jl I think the rules should be:

API: NamedTuple and DataFrameRow should have exactly the same behavior when push!-ed to a DataFrame;
efficiency: we should check if DataFrameRow has the same parent as DataFrame to which it is pushed and then use more efficient column matching using Index or SubIndex depending on whether we have a subset of columns or all columns in a DataFrameRow.

Do we agree on the approach?

bkamins · 2019-01-17T02:34:55Z

Can we close it and instead work in #1685?

ararslan approved these changes Jul 5, 2018

View reviewed changes

nalimilan requested changes Jul 7, 2018

View reviewed changes

omus changed the title ~~Support appending a DataFrameRow to a DataFrame~~ Support pushing a DataFrameRow to a DataFrame Jul 9, 2018

nalimilan reviewed Jul 9, 2018

View reviewed changes

This was referenced Jul 11, 2018

Scalar indexing by row should return a DataFrameRow #1400

Closed

Deprecate iterating over DataFrameRow in favor of pairs() #1449

Merged

omus force-pushed the cv/append-dataframerow branch from abae9b0 to 9e195a9 Compare September 26, 2018 16:15

omus added 2 commits September 26, 2018 21:53

Support pushing a DataFrameRow to a DataFrame

75ac457

Move DataFrameRow push into deprecated.jl

efb29b2

omus force-pushed the cv/append-dataframerow branch from 2117671 to efb29b2 Compare September 27, 2018 02:58

nalimilan mentioned this pull request Jan 15, 2019

DataFrameRow conversion to DataFrame broken #1675

Closed

bkamins mentioned this pull request Jan 17, 2019

Improve push! and DataFrame for DataFrameRow #1685

Merged

nalimilan closed this Jan 17, 2019

nalimilan deleted the cv/append-dataframerow branch January 17, 2019 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pushing a DataFrameRow to a DataFrame #1439

Support pushing a DataFrameRow to a DataFrame #1439

omus commented Jul 5, 2018

ararslan left a comment

nalimilan left a comment

omus commented Jul 9, 2018

nalimilan Jul 9, 2018

omus Jul 10, 2018 •

edited

Loading

nalimilan Jul 10, 2018

nalimilan Jul 11, 2018

nalimilan Sep 22, 2018

omus Sep 26, 2018

omus commented Sep 26, 2018

nalimilan commented Sep 27, 2018 •

edited

Loading

omus commented Oct 9, 2018

nalimilan commented Oct 9, 2018

bkamins commented Jan 13, 2019

bkamins commented Jan 17, 2019

Support pushing a DataFrameRow to a DataFrame #1439

Support pushing a DataFrameRow to a DataFrame #1439

Conversation

omus commented Jul 5, 2018

ararslan left a comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

omus commented Jul 9, 2018

nalimilan Jul 9, 2018

Choose a reason for hiding this comment

omus Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

nalimilan Jul 10, 2018

Choose a reason for hiding this comment

nalimilan Jul 11, 2018

Choose a reason for hiding this comment

nalimilan Sep 22, 2018

Choose a reason for hiding this comment

omus Sep 26, 2018

Choose a reason for hiding this comment

omus commented Sep 26, 2018

nalimilan commented Sep 27, 2018 • edited Loading

omus commented Oct 9, 2018

nalimilan commented Oct 9, 2018

bkamins commented Jan 13, 2019

bkamins commented Jan 17, 2019

omus Jul 10, 2018 •

edited

Loading

nalimilan commented Sep 27, 2018 •

edited

Loading