Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow combine without any transformations #2633

Closed
wants to merge 1 commit into from
Closed

Conversation

nalimilan
Copy link
Member

Since combine(x) always returns an empty data frame, it can hardly be useful.
All other syntaxes return one or more rows per group so it's not really consistent with them either.

Disallowing this can potentially allow other behaviors in the future if we want, e.g. returning only the grouping keys with one row per group like dplyr.

See also https://stackoverflow.com/questions/66142331/summary-table-of-unique-value-combinations-in-dataframes-jl.

Cc: @bkamins, @kleinschmidt

Since `combine(x)` always returns an empty data frame, it can hardly be useful.
All other syntaxes return one or more rows per group so it's not really consistent
with them either.

Disallowing this can potentially allow other behaviors in the future if we want,
e.g. returning only the grouping keys with one row per group like dplyr.
@bkamins
Copy link
Member

bkamins commented Feb 26, 2021

This is breaking so we should carefully consider this decision. Conrete issues are:

  1. What should then select() do?
  2. The statement

All other syntaxes return one or more rows per group so it's not really consistent with them either.

is not true, e.g.

julia> df = DataFrame(a=1)
1×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> combine(df, Not(:a))
0×0 DataFrame

julia> select(df, Not(:a))
0×0 DataFrame

What do you think you would recommend given these considerations?

@bkamins bkamins added breaking The proposed change is breaking. decision labels Feb 26, 2021
@bkamins bkamins added this to the 1.0 milestone Feb 26, 2021
@bkamins bkamins linked an issue Feb 28, 2021 that may be closed by this pull request
@nalimilan
Copy link
Member Author

Good point. Empty selectors are indeed a more subtle case and which is trickier to decide about. On the one hand, if the passed selector happens to be empty, the current zero-row result can be confusing, notably if it triggers an error down the stream since in other cases you get different dimensions. On the other hand, getting an error early in that case could be annoying if you're prepared to handle that case, as it could force you do use try... catch instead of just carrying around the empty data frame. But that's hard to decide as I'm not sure in what concrete use cases this kind of pattern can arise. Any ideas?

@bkamins
Copy link
Member

bkamins commented Mar 1, 2021

@kleinschmidt - you raised the original issue - could you please comment about use-cases? I, as noted in #2635 am opposed making breaking changes unless we see a reason to do so.

@bkamins bkamins mentioned this pull request Mar 4, 2021
19 tasks
@bkamins
Copy link
Member

bkamins commented Mar 25, 2021

I would close this and #2620. These calls are not likely to be used or useful. We could throw an error as @nalimilan suggests, but I do not see that in the future we will have a better proposal what these functions should do.

In general I would try to avoid the situation like we had with All(...) which worked differently when being passed arguments and when no arguments were passed.

@nalimilan
Copy link
Member Author

Yeah, there doesn't seem to be a compelling use case for it, in the worst case we'll change that in 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking The proposed change is breaking. decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

combine and select without arguments
2 participants