SQL: Enable aggregations to create a separate bucket for missing values #32832

astefan · 2018-08-14T07:02:13Z

At the moment, the GROUP BY statements in SELECTs will not consider inexistent values in columns. This PR enables missing values to be added as a separate bucket in the aggregation results.
This is part of a larger effort regarding NULL/missing values handling.
This fixes #32831

imotov

In general it LGTM, but I have a couple of small concerns. 1) I am not sure it is worth adding a separate test_emp_with_nulls. I think it might be better to modify test_emp to have some null values. Otherwise we will only ever use it when we have intent to test nulls. 2) It is a breaking change. We can are still in experimental phase so we can break things, but I think it would be nice to document it somewhere.

astefan · 2018-08-16T07:13:13Z

@imotov the reason for creating - what I hope to be - a temporary set of test data is that the NULLs checking is an wide scenario covering many more areas and functionalities. I do have a more general issue covering NULLs in general.
In this way, we can enable grouping by non-existent values easily and quickly and then covering the rest of the scenarios (especially those involving all kinds of sql functions and all the H2 comparative tests added so far in the project) in a later push.

I could probably integrate the NULLs in the main set of test data, but the effort involved in fixing all the implications doing this might be more significant.

Regarding the breaking changes awareness, where would be the best place to document such a change?

imotov · 2018-08-16T14:48:40Z

I am fine with having two tables for now.

Regarding the breaking changes awareness, where would be the best place to document such a change?

It's, probably, time to create a new file for sql in docs/reference/migration/migrate_* folder.

astefan · 2018-08-20T07:28:41Z

@imotov I've added the breaking change page for SQL.

astefan · 2018-08-20T13:59:29Z

Also, @imotov, should this go into 6.x as well? If it would then a different breaking page should be added, right?

imotov · 2018-08-20T14:47:57Z

Yeah, it's an experimental features, so I think it's ok to do it in 6.x as well. Indeed, it will be a similar file only in different directory.

nik9000

I think it'd be nice to have an example in the docs with an aggregation that includes null so I can get a good look at what the output is. At this point we just assert that it lines up with jdbc, but it'd still be nice to see.

nik9000 · 2018-08-20T17:46:32Z

docs/reference/migration/migrate_7_0/sql.asciidoc

+
+==== Grouping by columns with missing values will create an additional group 
+
+An additional group will be present in the result of requests containing a


If this goes to 6.x then this shouldn't actually be committed to migrate_7_0, only the 6.x version.

Hm... good point @nik9000... a breaking change happens once on the versions timeline.

This breaking change in 6.x should go to \docs\reference\migration\migrate_6_5.asciidoc, right?

nik9000 · 2018-08-20T17:47:42Z

x-pack/qa/sql/src/main/resources/agg_nulls.sql-spec

@@ -0,0 +1,14 @@
+selectGenderWithNullsAndGroupByGender
+SELECT gender, COUNT(*) count FROM test_emp_with_nulls GROUP BY gender ORDER BY gender;


Should this be a separate file? Maybe should just be part of the aggs file.

@nik9000 thank you for reviewing, much appreciated.
I've done it this way with the idea of refactoring this in the near future as a wider effort (part of #32079). Then the test_emp_with_nulls would disappear and the null values will be part of test_emp itself and more tests will be added to various test files (functions, selects with group by, IS NULL type of selects and possibly other sections). I considered the small change of allowing null values as part of the aggregations results worthy of being added now relatively quickly (and allow a minor functionality be available), instead of tackling the null values support wider task which will probably take more time. Also, when the wider null-handling task is considered, I can add the documentation covering the null group result.
Let me know your thoughts.

…o 32831fix

…es (#32832) Enable aggregations to create a separate bucket for missing values.

…rs null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832

astefan · 2018-08-27T15:51:16Z

Pushed to 6.x as well, together with breaking changes documentation update.

* master: Adjust BWC version on mapping version Token API supports the client_credentials grant (#33106) Build: forked compiler max memory matches jvmArgs (#33138) Introduce mapping version to index metadata (#33147) SQL: Enable aggregations to create a separate bucket for missing values (#32832) Fix grammar in contributing docs SECURITY: Fix Compile Error in ReservedRealmTests (#33166) APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Fix forbiddenapis on java 11 (#33116) Apply publishing to genreate pom (#33094) Have circuit breaker succeed on unknown mem usage Do not lose default mapper on metadata updates (#33153) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)

* 6.x: Introduce mapping version to index metadata (#33147) Move non duplicated actions back into xpack core (#32952) HLRC: Create server agnostic request and response (#32912) Build: forked compiler max memory matches jvmArgs (#33138) * Added breaking change section for GROUP BY behavior: now it considers null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832 SQL: Enable aggregations to create a separate bucket for missing values (#32832) [TEST] version guard for reload rest-api-spec Fix grammar in contributing docs APM server monitoring (#32515) Support only string `format` in date, root object & date range (#28117) [Rollup] Move toBuilders() methods out of rollup config objects (#32585) Accept Gradle build scan agreement (#30645) Fix forbiddenapis on java 11 (#33116) Run forbidden api checks with runtimeJavaVersion (#32947) Apply publishing to genreate pom (#33094) Fix a mappings update test (#33146) Reload Secure Settings REST specs & docs (#32990) Refactor CachingUsernamePassword realm (#32646)

Enable aggregations to create a separate bucket for missing values.

17a232b

astefan added >bug v7.0.0 :Analytics/SQL SQL querying v6.5.0 labels Aug 14, 2018

astefan requested review from costin, nik9000 and imotov August 14, 2018 07:02

imotov reviewed Aug 15, 2018

View reviewed changes

Added the breaking change doc page for SQL

6661d50

Merge remote-tracking branch 'remotes/origin/master' into 32831fix

570d553

nik9000 approved these changes Aug 20, 2018

View reviewed changes

astefan added 2 commits August 23, 2018 08:50

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

8853a6e

…o 32831fix

Make the breaking change documentation a 6.x thing only

96fbd13

astefan merged commit 3d9ca4b into elastic:master Aug 27, 2018

astefan added a commit that referenced this pull request Aug 27, 2018

SQL: Enable aggregations to create a separate bucket for missing valu…

7d8780d

…es (#32832) Enable aggregations to create a separate bucket for missing values.

astefan added a commit that referenced this pull request Aug 27, 2018

* Added breaking change section for GROUP BY behavior: now it conside…

b56c38d

…rs null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL: Enable aggregations to create a separate bucket for missing values #32832

SQL: Enable aggregations to create a separate bucket for missing values #32832

astefan commented Aug 14, 2018

imotov left a comment

astefan commented Aug 16, 2018

imotov commented Aug 16, 2018

astefan commented Aug 20, 2018

astefan commented Aug 20, 2018

imotov commented Aug 20, 2018

nik9000 left a comment

nik9000 Aug 20, 2018

astefan Aug 23, 2018

astefan Aug 23, 2018

nik9000 Aug 20, 2018

astefan Aug 21, 2018

astefan commented Aug 27, 2018


		==== Grouping by columns with missing values will create an additional group

		An additional group will be present in the result of requests containing a

		@@ -0,0 +1,14 @@
		selectGenderWithNullsAndGroupByGender
		SELECT gender, COUNT(*) count FROM test_emp_with_nulls GROUP BY gender ORDER BY gender;

SQL: Enable aggregations to create a separate bucket for missing values #32832

SQL: Enable aggregations to create a separate bucket for missing values #32832

Conversation

astefan commented Aug 14, 2018

imotov left a comment

Choose a reason for hiding this comment

astefan commented Aug 16, 2018

imotov commented Aug 16, 2018

astefan commented Aug 20, 2018

astefan commented Aug 20, 2018

imotov commented Aug 20, 2018

nik9000 left a comment

Choose a reason for hiding this comment

nik9000 Aug 20, 2018

Choose a reason for hiding this comment

astefan Aug 23, 2018

Choose a reason for hiding this comment

astefan Aug 23, 2018

Choose a reason for hiding this comment

nik9000 Aug 20, 2018

Choose a reason for hiding this comment

astefan Aug 21, 2018

Choose a reason for hiding this comment

astefan commented Aug 27, 2018