-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement n_distinct()
for multiple arguments using duckdb structs
#122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A small integration test (what does n_distinct()
actually do with a real dummy table) would be useful too.
Sorry about the conflicts.
eebf6c1
to
48fa390
Compare
I added a bunch of tests to check the actual computation. In the meantime I also realized that my assumption that duckdb would use a Regarding the use of |
Thanks. If we must, we can reimplement here, or use plain |
Thanks.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #122 +/- ##
==========================================
+ Coverage 79.70% 79.76% +0.05%
==========================================
Files 108 108
Lines 3711 3722 +11
==========================================
+ Hits 2958 2969 +11
Misses 753 753 ☔ View full report in Codecov by Sentry. |
4d6bbe0
to
9de662f
Compare
…s with `na.rm = FALSE`.
… rid of `dbplyr:::glue_sql2()` usage
243840f
to
eb0eeb4
Compare
@krlmlr Anything else you would like me to improve? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nudge!
str_struct <- paste0("row(", paste0(list(...), collapse = ", "), ")") | ||
|
||
sql(paste0("COUNT(DISTINCT ", str_struct, ")")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgirlich: Can you please help me review this part of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks good to me. This also works for diverse user input, e.g. n_distinct(x, a / (b - 1))
or n_distinct(!!x)
.
One minor thing: you might want to check for empty dots.
pull(summarize(df_na, n = n_distinct(x, y)), n), | ||
5 | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add a test that uses a window.
str_struct <- paste0("row(", paste0(list(...), collapse = ", "), ")") | ||
|
||
sql(paste0("COUNT(DISTINCT ", str_struct, ")")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks good to me. This also works for diverse user input, e.g. n_distinct(x, a / (b - 1))
or n_distinct(!!x)
.
One minor thing: you might want to check for empty dots.
n_distinct()
for multiple arguments using duckdb structs
Thanks! Adding the extra tests and checks could be done in a small follow-up PR. |
This PR attempts to adress #110, implementing
n_distinct()
making use of duckdb's structs.I don't have any experience in writing dbplyr SQL translations, in particular I have no idea of the 'best practices', but I tried to mimic as much from the already existing code as I could.
A couple of notes:
n_distinct()
. If there is an alternative prefered way to do it, please let me know.glue::glue()
. Although dbplyr depends on glue, and in this function we are assuming that dbplyr is installed, I am not sure if I am allowed to assume that this function exists. Maybe I should not make use of it?Thanks! :)
Closes #110