Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle Tables.AbstractRow in operation specficiation #3348

Merged
merged 13 commits into from
Jul 2, 2023

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Jun 25, 2023

Fixes #3335

After this PR Tables.AbstractRow is treated in the same way as DataFrames.DataFrameRow in all combine/select/transform operations.

@bkamins bkamins requested a review from nalimilan June 25, 2023 14:06
@bkamins bkamins added this to the 1.6 milestone Jun 25, 2023
@bkamins
Copy link
Member Author

bkamins commented Jun 25, 2023

This is mildly breaking, but I assume it is OK to add it in 1.6 release. The point is that I assume that when someone uses Tables.AbstractRow then expansion is expected.

NEWS.md Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
docs/src/man/split_apply_combine.md Outdated Show resolved Hide resolved
docs/src/man/split_apply_combine.md Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
@@ -770,6 +773,15 @@ function _add_multicol_res(res::DataFrameRow, newdf::DataFrame, df::AbstractData
_insert_row_multicolumn(newdf, df, allow_resizing_newdf, colnames, res)
end

function _add_multicol_res(res::Tables.AbstractRow, newdf::DataFrame, df::AbstractDataFrame,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, maybe we should make DataFrameRow <: AbstractRow? That would avoid duplicating a few methods and simplifying type unions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it.

First (for future readers) DataFrameRow fully supports Tables.AbstractRow interface, so this is just an issue of code design.

Pros of doing subtyping:

  • Less code.
  • It is more clear which functionalities are on more "abstract level".

Cons of doing subtyping:

  • Most of methods for DataFrameRow and Tables.AbstractRow are the same. However, not all of them. Some methods are different, because DataFrameRow has a richer functionality than Tables.AbstractRow. The challenge is that keeping DataFrameRow and Tables.AbstractRow separate makes it easier (at least for me) in the future to find all places in the source code where DataFrameRow is used. I know this is not a super strong reason but with the size of the code that we have I often end up doing updates of code by running "find in all files" of a certain code pattern (as otherwise it is easy to forget about some place when some functionality is used).

(for a reference: I started implementing this change and noticed that it would affect more code than only this PR and after this change)

So - we could add it, but it also has some practical downsides. What do you think, given this, we should do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Given that Tables.jl uses duck typing anyway it's not super important to have DataFrameRow <: AbstractRow, and indeed nobody has requested it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - duck typing is my main reason for sticking with what we have.

test/select.jl Outdated Show resolved Hide resolved
test/select.jl Outdated Show resolved Hide resolved
test/select.jl Outdated Show resolved Hide resolved
@@ -2939,4 +2939,82 @@ end
end
end

@testset "Tables.AbstractRow interface" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also be tested in other tests where we cover DataFrameRow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more such tests (where I managed to track them down).

@nalimilan
Copy link
Member

The 32-bit failure seems unrelated but real?

@bkamins bkamins merged commit 6338825 into main Jul 2, 2023
@bkamins bkamins deleted the bk/AbstractRow branch July 2, 2023 20:19
@bkamins
Copy link
Member Author

bkamins commented Jul 2, 2023

Thank you (I will fix the 32-bit error in a separate PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accepting array element in rows specificed by named tuples, in combine
2 participants