Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameRow and NamedTuple comparisons #2668

Closed
bkamins opened this issue Mar 22, 2021 · 4 comments · Fixed by #2669
Closed

DataFrameRow and NamedTuple comparisons #2668

bkamins opened this issue Mar 22, 2021 · 4 comments · Fixed by #2669
Labels
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Mar 22, 2021

Should we update isequal and == and isless to allow comparing NamedTuple and DataFrameRow directly without conversion? (this will also impact hash) See #2639 for a related discussion on GroupKey.

Also should GroupKey be allowed to be compared to DataFrameRow then?

@bkamins bkamins added this to the 1.0 milestone Mar 22, 2021
@pdeffebach
Copy link
Contributor

I vote no. The user can cast as a NamedTuple if they need to.

Also no for a DataFramesRow. The user can also cast as a NamedTuple. In general I don't see GroupKeys and DataFrameRows as very similr.

@bkamins bkamins linked a pull request Mar 23, 2021 that will close this issue
@nalimilan
Copy link
Member

I think it's worth defining == and isequal to be consistent with NamedTuple. That way we are closer to the row object types used by other Tables.jl tables, notably NamedTuple{<:AbstractVector} and AbstractVector{<:NamedTuple}, which use NamedTuple to represent rows. This isn't guaranteed by the Tables.jl interface, but that's convenient to have.

isless and < are not required, and can be added later as long as they currently throw errors.

@bkamins
Copy link
Member Author

bkamins commented Mar 23, 2021

I agree. I will keep == and isequal but remove < and isless in #2669.
The thing is that this will allow the following code:

for r in eachrow(df)
    if r == (a=1, b=2)
        # do something
    end
end

and it is natural to allow for this without requiring a conversion.

@bkamins
Copy link
Member Author

bkamins commented Mar 23, 2021

Now - thinking about it more. Since we already allow isless for DataFrameRow the same logic as above applies to it, as the following code is equally natural to the one above:

for r in eachrow(df)
    if r < (a=1, b=2)
        # do something
    end
end

Do we see any downsides of allowing < and isless across: NamedTuple, DataFrameRow and GroupKey?

Regarding the comment that you can always cast to NamedTuple - this is true, but casting to NamedTuple is an expensive operation (and the wider the data frame is the more expensive it gets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants