Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add allcombinations #3031

Merged
merged 12 commits into from
Apr 25, 2022
Merged

add allcombinations #3031

merged 12 commits into from
Apr 25, 2022

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 29, 2022

Fixes #3027

Before I finalize this PR (add tests and NEWS.md) please comment if we like the design.
In particular do you think that the set of proposed signatures is OK.

Thank you!

@bkamins bkamins added this to the 1.4 milestone Mar 29, 2022
@bkamins bkamins requested a review from nalimilan March 29, 2022 10:26
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Show resolved Hide resolved
@@ -49,6 +49,7 @@ export AbstractDataFrame,
disallowmissing!,
dropmissing!,
dropmissing,
expandgrid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we find a name more closely related to fillcombinations, which is similar? Maybe this should even be a method of fillcombinations?

It would also make sense to mention the relationship between the two functions in their docstrings (dplyr does that IIRC).

Copy link
Member Author

@bkamins bkamins Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dplyr mentions that expandgrid is internally used in their fillcombinations equivalent. However, it was not possible in DataFrames.jl (the reason is that in dplyr everything is a vector, and in Julia we have scalars and need pseudo-broadcasting).

Conceptually this is very different, as fillcombinations requires input to have equal lengths of vectors passed (that is why they are stored in a data frame), while here each input can (and usually has) a different length.

We could name this function or just product and not export it, but document that users can write DataFrames.product. This would be consistent with Iterators.product as it does exactly the same thing (but with a set of rules that is consistent with pseudo-broadcasting).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we could use expandcombinations as a name if we want the name to be similar (and then export it).
However, I tend to prefer DataFrames.product that is documented as part of API but unexported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as for the reason why I cannot call expandgrid in fillcombinations you can have a look at their codes and compare the internals of the core loop that does the expansion - there are subtle differences since there we repeat levels, but need to allocate columns from source vector - e.g. for CategoricalArray it makes a difference and cannot be handled correctly if passed to expandgrid)

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Apr 2, 2022

OK - so what do we do about the name? Keep expandgrid (a la R), use DataFrames.product or something else e.g. allcombinations?

@nalimilan
Copy link
Member

The drawback of DataFrames.product is that it would be the only unexported function in the API, thus introducing a new pattern. "grid" isn't a common term in Julia AFAICT so expandgrid sounds weird. Maybe something based on "combinations" would be more consistent with fillcombinations. allcombinations or expandcombinations?

@bkamins
Copy link
Member Author

bkamins commented Apr 2, 2022

I will go for allcombinations then (it seems R docstring uses a similar terminology, although technically speaking "combinations" is something different and here we are doing cartesian product)

@bkamins bkamins changed the title add expandgrid add allcombinations Apr 2, 2022
@bkamins
Copy link
Member Author

bkamins commented Apr 2, 2022

@nalimilan - the PR should be ready for a final review. Thank you!

@bkamins
Copy link
Member Author

bkamins commented Apr 6, 2022

@nalimilan - so in the end do we stick with allcombinations? (being aware that some mathematicians might complain?)

@bkamins
Copy link
Member Author

bkamins commented Apr 23, 2022

@nalimilan - I have added DataFrame as a first argument requirement. The PR should be updated when JuliaData/DataAPI.jl#47 is merged and released. But if other things are OK for you I will write the tests of the current functionality, so please let me know. Thank you!

src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins bkamins merged commit ca0fb4d into main Apr 25, 2022
@bkamins bkamins deleted the bk/expandgrid branch April 25, 2022 11:20
@bkamins
Copy link
Member Author

bkamins commented Apr 25, 2022

Thank you!

@vjd
Copy link

vjd commented Apr 25, 2022

🎉

@andreasnoack
Copy link
Member

andreasnoack commented May 23, 2022

Isn't this a different API for the same functionality that crossjoin provides? If both at needed then it might be useful to cross reference.

@andreasnoack andreasnoack mentioned this pull request May 23, 2022
@bkamins
Copy link
Member Author

bkamins commented May 23, 2022

allcombinations(DataFrame, "a" => 1:2, "b" => 'a':'c', "c" => "const")

is the same as

crossjoin(DataFrame("a" => 1:2), DataFrame("b" => 'a':'c'), DataFrame("c" => "const"))

Most users find the latter a bit inconvenient to write.

What would you propose exactly to add to the documentation?

@andreasnoack
Copy link
Member

I was thinking a "See also" entry.

Most users find the latter a bit inconvenient to write.

The new function might be a bit simpler to type. I was mostly surprised that crossjoin wasn't mentioned in the issue. It made it look like the functionality wasn't already available and I'm pretty sure that some of people asking for an expand.grid function weren't aware of crossjoin.

@bkamins
Copy link
Member Author

bkamins commented May 23, 2022

OK - I will add a link to crossjoin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add expandgrid
4 participants