Correctly handle Tables.AbstractRow in operation specficiation #3348

bkamins · 2023-06-25T14:06:31Z

After this PR Tables.AbstractRow is treated in the same way as DataFrames.DataFrameRow in all combine/select/transform operations.

bkamins · 2023-06-25T14:10:37Z

This is mildly breaking, but I assume it is OK to add it in 1.6 release. The point is that I assume that when someone uses Tables.AbstractRow then expansion is expected.

NEWS.md

docs/src/man/split_apply_combine.md

src/abstractdataframe/selection.jl

nalimilan · 2023-06-25T20:04:19Z

src/abstractdataframe/selection.jl

@@ -770,6 +773,15 @@ function _add_multicol_res(res::DataFrameRow, newdf::DataFrame, df::AbstractData
    _insert_row_multicolumn(newdf, df, allow_resizing_newdf, colnames, res)
 end

+function _add_multicol_res(res::Tables.AbstractRow, newdf::DataFrame, df::AbstractDataFrame,


While we're at it, maybe we should make DataFrameRow <: AbstractRow? That would avoid duplicating a few methods and simplifying type unions.

I was thinking about it.

First (for future readers) DataFrameRow fully supports Tables.AbstractRow interface, so this is just an issue of code design.

Pros of doing subtyping:

Less code.

It is more clear which functionalities are on more "abstract level".

Cons of doing subtyping:

Most of methods for DataFrameRow and Tables.AbstractRow are the same. However, not all of them. Some methods are different, because DataFrameRow has a richer functionality than Tables.AbstractRow. The challenge is that keeping DataFrameRow and Tables.AbstractRow separate makes it easier (at least for me) in the future to find all places in the source code where DataFrameRow is used. I know this is not a super strong reason but with the size of the code that we have I often end up doing updates of code by running "find in all files" of a certain code pattern (as otherwise it is easy to forget about some place when some functionality is used).

(for a reference: I started implementing this change and noticed that it would affect more code than only this PR and after this change)

So - we could add it, but it also has some practical downsides. What do you think, given this, we should do?

I don't have a strong opinion. Given that Tables.jl uses duck typing anyway it's not super important to have DataFrameRow <: AbstractRow, and indeed nobody has requested it.

Yes - duck typing is my main reason for sticking with what we have.

test/select.jl

nalimilan · 2023-06-25T20:19:19Z

test/select.jl

@@ -2939,4 +2939,82 @@ end
    end
 end

+@testset "Tables.AbstractRow interface" begin


Maybe this should also be tested in other tests where we cover DataFrameRow?

I have added more such tests (where I managed to track them down).

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

nalimilan · 2023-07-02T17:46:40Z

The 32-bit failure seems unrelated but real?

bkamins · 2023-07-02T20:19:18Z

Thank you (I will fix the 32-bit error in a separate PR)

bkamins added 3 commits June 25, 2023 15:20

initial implementation of AbstractRow support

35b4090

Correctly handle Tables.AbstractRow in operation specification

f6fff83

some more tests

7231c06

bkamins requested a review from nalimilan June 25, 2023 14:06

bkamins added the feature label Jun 25, 2023

bkamins added this to the 1.6 milestone Jun 25, 2023

bkamins added 2 commits June 25, 2023 16:07

enable precompilation back

0415a49

complete docs

08f86aa

bkamins added 2 commits June 25, 2023 16:13

add NEWS.md

7b52d67

improve test coverage

d4dcd34

nalimilan reviewed Jun 25, 2023

View reviewed changes

bkamins and others added 6 commits June 26, 2023 14:29

Apply suggestions from code review

65098a4

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

add more tests

7b97ea1

fix indentation

c50d44c

fix typo

cc33623

fix tests

86ca60f

fix typo

8c14f20

nalimilan approved these changes Jul 2, 2023

View reviewed changes

bkamins merged commit 6338825 into main Jul 2, 2023

bkamins deleted the bk/AbstractRow branch July 2, 2023 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly handle Tables.AbstractRow in operation specficiation #3348

Correctly handle Tables.AbstractRow in operation specficiation #3348

bkamins commented Jun 25, 2023

bkamins commented Jun 25, 2023

nalimilan Jun 25, 2023

bkamins Jun 26, 2023

nalimilan Jul 2, 2023

bkamins Jul 2, 2023

nalimilan Jun 25, 2023

bkamins Jun 26, 2023

nalimilan commented Jul 2, 2023

bkamins commented Jul 2, 2023

Correctly handle Tables.AbstractRow in operation specficiation #3348

Correctly handle Tables.AbstractRow in operation specficiation #3348

Conversation

bkamins commented Jun 25, 2023

bkamins commented Jun 25, 2023

nalimilan Jun 25, 2023

Choose a reason for hiding this comment

bkamins Jun 26, 2023

Choose a reason for hiding this comment

nalimilan Jul 2, 2023

Choose a reason for hiding this comment

bkamins Jul 2, 2023

Choose a reason for hiding this comment

nalimilan Jun 25, 2023

Choose a reason for hiding this comment

bkamins Jun 26, 2023

Choose a reason for hiding this comment

nalimilan commented Jul 2, 2023

bkamins commented Jul 2, 2023