-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(data-exploration): countIf and other aggregations #13854
Conversation
@@ -61,7 +61,9 @@ def test_hogql_materialized_fields_and_properties(self): | |||
|
|||
def test_hogql_methods(self): | |||
self.assertEqual(self._translate("count()"), "count(*)") | |||
self.assertEqual(self._translate("count(event)"), "count(distinct event)") | |||
self.assertEqual(self._translate("countDistinct(event)"), "count(distinct event)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one does count(properties.value == 2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error because it expects just 0 arguments. I'd add the test, but the tests pass right now so it'll come in another PR 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, sorry, I meant countDistinct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, give it a try :)
I guess the answer is you'll get a 0
, 1
or 2
depending on the data.
For me it crashes, but adding toInt
around the property gives 0. Doing coalesce(toInt(properties.value), 0) == 2
would likely work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I completely agree with the thinking behind only exposing intuitive count
functions – anyone with just a bit of SQL experience should be able to pick this up immediately
Changes
Adds support for the following aggregations (with number of allowed arguments)
These two translate as follows
countDistinct(x)
->count(distinct x)
(in hogql) ->uniqExact(x)
(internally in clickhouse)countDistinctIf(x)
->countIf(distinct x)
(in hogql) ->uniqExactIf(x)
(internally in clickhouse)I'm softly biased towards not exposing
uniq
anduniqExact
anduniqIf
anduniqExactIf
to the users.uniq
anduniqExact
are non-standard SQL terms (compared to "count distinct" which I've seen in probably every SQL dialect)uniq
returns non-precise results (uniqExtract
is precise but slower), so we'd need to explain this to users at every turn. We need to talk about the difference between the two often.uniqExact
aggregation just rolls off your tongue (uniq
?unique
?uniquerque
?)countDistinct
, which is the natural thing you'd try if you have used any SQL before. This way we can later optimize slow queries to useuniq
if needed, while keeping everyone using the "data integrity is important"uniqExact
instead.How did you test this code?
Updated the test