add ==, isequal <, and isless for DataFrameRow and GroupKey #2669

bkamins · 2021-03-22T21:14:44Z

As usual, before I add tests and update documentation please have a look at the implementation.

It would be good to have a decision on JuliaLang/julia#40142 before finalizing this PR.

In particular we fix a bug according to which two DataFrameRow were true in == if they were the same row from the same data frame even if they contained missing (missing should be produced then).

bkamins · 2021-03-22T21:17:42Z

Ah - and it is mildly breaking as we now define hash for DataFrameRow and GroupKey to be consistent with respective NamedTuple hash.

This means in particular that this hash is different than rowhash used in grouping (we could make rowhash consistent with the new hash - @nalimilan - do you think it is worth doing?).

src/other/utils.jl

src/dataframerow/dataframerow.jl

src/groupeddataframe/groupeddataframe.jl

nalimilan · 2021-03-23T08:32:09Z

This means in particular that this hash is different than rowhash used in grouping (we could make rowhash consistent with the new hash - @nalimilan - do you think it is worth doing?).

We probably don't care about this since it's internal, right?

bkamins · 2021-03-23T09:05:31Z

We probably don't care about this since it's internal, right?

Yes - I have checked that we use rowhash only in one place in the code so it should be OK.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins · 2021-03-23T12:20:53Z

Also rowhash ignores column names for speed, so indeed they do not have to match.

src/dataframerow/dataframerow.jl

src/groupeddataframe/groupeddataframe.jl

bkamins · 2021-03-25T10:21:51Z

OK - I hope I fixed everything :). This was hard (but hopefully now Julia Base will be also more consistent with < handling).

src/dataframerow/dataframerow.jl

test/dataframerow.jl

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins · 2021-03-25T14:45:27Z

@pdeffebach - this is what we concluded is a good design. Could you please comment on it before merging (as you were against it)?

pdeffebach · 2021-03-25T15:11:28Z

Sounds good! I'm fine with this.

bkamins · 2021-03-25T16:24:55Z

src/dataframerow/utils.jl

+# table columns are passed as a tuple of vectors to ensure type specialization
+rowhash(cols::Tuple{AbstractVector}, r::Int, h::UInt = zero(UInt))::UInt =
+    hash(cols[1][r], h)
+function rowhash(cols::Tuple{Vararg{AbstractVector}}, r::Int, h::UInt = zero(UInt))::UInt


it is interesting that we do not have this covered by tests. I will add some.

it turns out we do not use these functions. I have removed them. @nalimilan - could you please have a look if we would ever need them? (I have never used them)

AFAICT one guy removed the last use of findrow and group_rows in a recent PR. :-D #2641

It's weird that coverage didn't notice this. Maybe we only looked at the changed code and not at unrelated parts?

There were tests that checked correctness of internal functions.

test/dataframerow.jl

nalimilan · 2021-03-25T18:47:53Z

src/dataframerow/utils.jl

-    ngroups, rhashes, gslots, sorted =
-        row_group_slots(ntuple(i -> df[!, i], ncol(df)), Val(true), groups, false, false)
-    rperm, starts, stops = compute_indices(groups, ngroups)
-    return RowGroupDict(df, rhashes, gslots, groups, rperm, starts, stops)


I think RowGroupDict can also be removed.

Indeed - I am running tests to double check. It is astonishing how much we have reworked internally in this release.

OK - all seems clean. I will merge the PR after CI passes.

Now that this file contains only grouping code, we will be able to move it to the corresponding folder and rename it without touching anything else. :-)

OK will move it in this PR as otherwise we will forget. Also nonunique uses it, but I think it is not a problem.

bkamins · 2021-03-25T21:00:56Z

Thank you!

pdeffebach · 2021-03-25T21:19:03Z

This allows

(a = 1, b = 2) in eachrow(df)

right?

bkamins · 2021-03-25T21:34:05Z

Right, and even more e.g.:

julia> df = DataFrame(a=1:5, b=2:6)
5×2 DataFrame
 Row │ a      b     
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2
   2 │     2      3
   3 │     3      4
   4 │     4      5
   5 │     5      6

julia> (a=1, b=2) in eachrow(df)
true

julia> Ref((a=4, b=4)) .< eachrow(df)
5-element BitVector:
 0
 0
 0
 1
 1

add ==, isequal and isless for DataFrameRow and GroupKey

3f61e4c

bkamins added bug feature grouping labels Mar 22, 2021

bkamins added this to the 1.0 milestone Mar 22, 2021

bkamins added the breaking The proposed change is breaking. label Mar 22, 2021

bkamins linked an issue Mar 23, 2021 that may be closed by this pull request

DataFrameRow and NamedTuple comparisons #2668

Closed

nalimilan reviewed Mar 23, 2021

View reviewed changes

src/other/utils.jl Show resolved Hide resolved

src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved

src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved

src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved

bkamins mentioned this pull request Mar 23, 2021

DataFrameRow and NamedTuple comparisons #2668

Closed

Apply suggestions from code review

e10f7f6

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

bkamins added 2 commits March 23, 2021 13:36

use macros to define functions and add <

0b37375

Merge remote-tracking branch 'upstream/nt_like_types' into nt_like_types

77daf7a

nalimilan reviewed Mar 24, 2021

View reviewed changes

src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved

src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved

use @eval more aggresively

9d90def

bkamins changed the title ~~add ==, isequal and isless for DataFrameRow and GroupKey~~ add ==, isequal <, and isless for DataFrameRow and GroupKey Mar 24, 2021

bkamins added 3 commits March 24, 2021 16:09

fix metaprogramming issues

875b71a

update codes and write tests

328740b

add NEWS.md entry

74c63d6

nalimilan reviewed Mar 25, 2021

View reviewed changes

src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved

test/dataframerow.jl Show resolved Hide resolved

bkamins and others added 2 commits March 25, 2021 14:41

Update src/dataframerow/dataframerow.jl

6cdabf7

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

improve tests

d1aabc0

nalimilan approved these changes Mar 25, 2021

View reviewed changes

bkamins commented Mar 25, 2021

View reviewed changes

update tests and remove unused functions

bde23ea

bkamins commented Mar 25, 2021

View reviewed changes

test/dataframerow.jl Outdated Show resolved Hide resolved

bkamins commented Mar 25, 2021

View reviewed changes

test/dataframerow.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

353ef94

nalimilan reviewed Mar 25, 2021

View reviewed changes

bkamins added 4 commits March 25, 2021 20:27

remove RowGroupDict

0a96620

Merge remote-tracking branch 'upstream/nt_like_types' into nt_like_types

db0b72f

move utils to groupeddataframe

fa4d9a5

Merge branch 'main' into nt_like_types

2dcd086

bkamins merged commit 1d3f31b into main Mar 25, 2021

bkamins deleted the nt_like_types branch March 25, 2021 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ==, isequal <, and isless for DataFrameRow and GroupKey #2669

add ==, isequal <, and isless for DataFrameRow and GroupKey #2669

bkamins commented Mar 22, 2021

bkamins commented Mar 22, 2021 •

edited

Loading

nalimilan commented Mar 23, 2021

bkamins commented Mar 23, 2021

bkamins commented Mar 23, 2021

bkamins commented Mar 25, 2021

bkamins commented Mar 25, 2021

pdeffebach commented Mar 25, 2021

bkamins Mar 25, 2021

bkamins Mar 25, 2021

nalimilan Mar 25, 2021

bkamins Mar 25, 2021

nalimilan Mar 25, 2021

bkamins Mar 25, 2021

bkamins Mar 25, 2021

nalimilan Mar 25, 2021

bkamins Mar 25, 2021

bkamins commented Mar 25, 2021

pdeffebach commented Mar 25, 2021

bkamins commented Mar 25, 2021

add ==, isequal <, and isless for DataFrameRow and GroupKey #2669

add ==, isequal <, and isless for DataFrameRow and GroupKey #2669

Conversation

bkamins commented Mar 22, 2021

bkamins commented Mar 22, 2021 • edited Loading

nalimilan commented Mar 23, 2021

bkamins commented Mar 23, 2021

bkamins commented Mar 23, 2021

bkamins commented Mar 25, 2021

bkamins commented Mar 25, 2021

pdeffebach commented Mar 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Mar 25, 2021

pdeffebach commented Mar 25, 2021

bkamins commented Mar 25, 2021

bkamins commented Mar 22, 2021 •

edited

Loading