[BREAKING] Handle zero groups #2324

bkamins · 2020-07-22T14:28:24Z

This is a major fix to split-apply-combine that introduces many internal changes and some breaking user visible changes.

What is chiefly changed:

cols field holds Symbol not Int; this was not strictly needed but as select! can mutate a parent of a GroupedDataFrame it is better to keep Symbols to avoid invalidating the GroupedDataFrame
proper handling of column order in transform! and transform
proper handling of cases when 0 groups are processed (the only exception left is combine(arg, ::DataFrame) when data frame passed has 0 rows which I leave for later as it is tricky to implement and would only obfuscate the code, and the use case is very limited)

This is breaking so it will require a minor release to go in.

bkamins · 2020-07-22T14:28:47Z

CC @pdeffebach - you might want to test it, as the cases are tricky.

pdeffebach · 2020-07-23T15:02:09Z

Thanks! I just played around with it and I think this is good. It basically just adds new columns so that the returned data frame has the correct names and types. I think this is convenient behavior since it requires less data validation on the user's side.

bkamins · 2020-07-23T17:02:25Z

Thank you for looking into this. I will re-read the whole code before @nalimilan goes back on-line to make sure we can merge this when he is available.

nalimilan

Thanks. Looks mostly good. I have to trust you regarding the places where you added checks for zero groups as the code is really tricky...

src/groupeddataframe/groupeddataframe.jl

nalimilan · 2020-07-26T15:28:00Z

src/groupeddataframe/splitapplycombine.jl

-                                collect(axes(df, 1)), [1], [nrow(df)], 1, nothing,
-                                Threads.ReentrantLock())
+        return GroupedDataFrame(df, Symbol[], ones(Int, nrow(df)),
+                                nothing, nothing, nothing, nrow(df) == 0 ? 0 : 1,


Why not continue filling fields with vectors instead of nothing?

Because they can have 0 or 1 element (this was a bug to fill them before). Now we could conditionally fill them like we fill number of groups, but as filling them later is very cheap anyway I felt that setting them to nothing is OK.

If computing the actual value here is trivial I'd do it, otherwise I agree it's cheap to compute later.

I would leave it for later - this way code is more modular (otherwise we hardcode something here and can forget to update it if we change the default way to compute them in 5 years from now).

src/groupeddataframe/splitapplycombine.jl

test/grouping.jl

src/groupeddataframe/splitapplycombine.jl

src/abstractdataframe/selection.jl

bkamins · 2020-07-26T17:17:32Z

Thank you for a review.

I have to trust you regarding the places where you added checks for zero groups as the code is really tricky ...

I hope I did it right. The changes are in a mix of very old code and new code, so I tried to cover everything in tests.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

…zero_groups

bkamins · 2020-07-30T10:36:23Z

only coverage fails

bkamins · 2020-08-01T05:29:53Z

I have added the test. only coverage fails as usual

nalimilan

Sorry for the delay!

bkamins · 2020-08-04T15:15:39Z

No problem - thank you for looking into it!

bkamins · 2020-08-04T15:58:43Z

Thank you!

bkamins added 5 commits July 6, 2020 14:52

temporary solution

b98d20e

correct handling of column order in select/transform

6332298

change cols field to Vector{Symbol}

f2231cf

fixes to implementations and test updates

4718a8c

one more fix

08b7d2f

bkamins added breaking The proposed change is breaking. bug feature grouping priority labels Jul 22, 2020

bkamins added this to the 1.0 milestone Jul 22, 2020

nalimilan reviewed Jul 26, 2020

View reviewed changes

bkamins and others added 5 commits July 26, 2020 19:37

simplify condition len == 0

fe68ff6

Apply suggestions from code review

a3de4f7

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>

Merge remote-tracking branch 'origin/handle_zero_groups' into handle_…

3c4f110

…zero_groups

remove intcols in groupby

57fa40a

fixes after code review

34181f1

add one more test

83801b7

nalimilan approved these changes Aug 4, 2020

View reviewed changes

bkamins merged commit c9a1329 into JuliaData:master Aug 4, 2020

bkamins deleted the handle_zero_groups branch August 4, 2020 15:58

bkamins changed the title ~~Handle zero groups~~ [BREAKING] Handle zero groups Aug 7, 2020

bkamins mentioned this pull request Aug 12, 2020

Loss of columns when joining on empty DataFrame #2363

Closed

bkamins mentioned this pull request Aug 30, 2020

fix wrong docstring that was a leftover of changing the aggregation rules #2399

Merged

JuliaRegistrator mentioned this pull request Nov 15, 2020

New version: DataFrames v0.22.0 JuliaRegistries/General#24650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING] Handle zero groups #2324

[BREAKING] Handle zero groups #2324

bkamins commented Jul 22, 2020

bkamins commented Jul 22, 2020

pdeffebach commented Jul 23, 2020

bkamins commented Jul 23, 2020

nalimilan left a comment

nalimilan Jul 26, 2020

bkamins Jul 26, 2020

nalimilan Aug 4, 2020

bkamins Aug 4, 2020

bkamins commented Jul 26, 2020

bkamins commented Jul 30, 2020

bkamins commented Aug 1, 2020

nalimilan left a comment

bkamins commented Aug 4, 2020

bkamins commented Aug 4, 2020

[BREAKING] Handle zero groups #2324

[BREAKING] Handle zero groups #2324

Conversation

bkamins commented Jul 22, 2020

bkamins commented Jul 22, 2020

pdeffebach commented Jul 23, 2020

bkamins commented Jul 23, 2020

nalimilan left a comment

Choose a reason for hiding this comment

nalimilan Jul 26, 2020

Choose a reason for hiding this comment

bkamins Jul 26, 2020

Choose a reason for hiding this comment

nalimilan Aug 4, 2020

Choose a reason for hiding this comment

bkamins Aug 4, 2020

Choose a reason for hiding this comment

bkamins commented Jul 26, 2020

bkamins commented Jul 30, 2020

bkamins commented Aug 1, 2020

nalimilan left a comment

Choose a reason for hiding this comment

bkamins commented Aug 4, 2020

bkamins commented Aug 4, 2020