-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Enable aggregations to create a separate bucket for missing values #32832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it LGTM, but I have a couple of small concerns. 1) I am not sure it is worth adding a separate test_emp_with_nulls
. I think it might be better to modify test_emp
to have some null values. Otherwise we will only ever use it when we have intent to test nulls. 2) It is a breaking change. We can are still in experimental phase so we can break things, but I think it would be nice to document it somewhere.
@imotov the reason for creating - what I hope to be - a temporary set of test data is that the NULLs checking is an wide scenario covering many more areas and functionalities. I do have a more general issue covering NULLs in general. I could probably integrate the NULLs in the main set of test data, but the effort involved in fixing all the implications doing this might be more significant. Regarding the breaking changes awareness, where would be the best place to document such a change? |
I am fine with having two tables for now.
It's, probably, time to create a new file for sql in |
@imotov I've added the breaking change page for SQL. |
Also, @imotov, should this go into 6.x as well? If it would then a different breaking page should be added, right? |
Yeah, it's an experimental features, so I think it's ok to do it in 6.x as well. Indeed, it will be a similar file only in different directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to have an example in the docs with an aggregation that includes null so I can get a good look at what the output is. At this point we just assert that it lines up with jdbc, but it'd still be nice to see.
|
||
==== Grouping by columns with missing values will create an additional group | ||
|
||
An additional group will be present in the result of requests containing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes to 6.x then this shouldn't actually be committed to migrate_7_0
, only the 6.x version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... good point @nik9000... a breaking change happens once on the versions timeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaking change in 6.x should go to \docs\reference\migration\migrate_6_5.asciidoc
, right?
@@ -0,0 +1,14 @@ | |||
selectGenderWithNullsAndGroupByGender | |||
SELECT gender, COUNT(*) count FROM test_emp_with_nulls GROUP BY gender ORDER BY gender; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate file? Maybe should just be part of the aggs file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 thank you for reviewing, much appreciated.
I've done it this way with the idea of refactoring this in the near future as a wider effort (part of #32079). Then the test_emp_with_nulls
would disappear and the null values will be part of test_emp
itself and more tests will be added to various test files (functions, select
s with group by
, IS NULL
type of selects and possibly other sections). I considered the small change of allowing null values as part of the aggregations results worthy of being added now relatively quickly (and allow a minor functionality be available), instead of tackling the null values support wider task which will probably take more time. Also, when the wider null-handling task is considered, I can add the documentation covering the null
group result.
Let me know your thoughts.
…es (#32832) Enable aggregations to create a separate bucket for missing values.
…rs null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832
Pushed to 6.x as well, together with breaking changes documentation update. |
* master: Adjust BWC version on mapping version Token API supports the client_credentials grant (#33106) Build: forked compiler max memory matches jvmArgs (#33138) Introduce mapping version to index metadata (#33147) SQL: Enable aggregations to create a separate bucket for missing values (#32832) Fix grammar in contributing docs SECURITY: Fix Compile Error in ReservedRealmTests (#33166) APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Fix forbiddenapis on java 11 (#33116) Apply publishing to genreate pom (#33094) Have circuit breaker succeed on unknown mem usage Do not lose default mapper on metadata updates (#33153) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)
* 6.x: Introduce mapping version to index metadata (#33147) Move non duplicated actions back into xpack core (#32952) HLRC: Create server agnostic request and response (#32912) Build: forked compiler max memory matches jvmArgs (#33138) * Added breaking change section for GROUP BY behavior: now it considers null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832 SQL: Enable aggregations to create a separate bucket for missing values (#32832) [TEST] version guard for reload rest-api-spec Fix grammar in contributing docs APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Accept Gradle build scan agreement (#30645) Fix forbiddenapis on java 11 (#33116) Run forbidden api checks with runtimeJavaVersion (#32947) Apply publishing to genreate pom (#33094) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)
At the moment, the GROUP BY statements in SELECTs will not consider inexistent values in columns. This PR enables missing values to be added as a separate bucket in the aggregation results.
This is part of a larger effort regarding NULL/missing values handling.
This fixes #32831