Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong results when grouping on a column with NULLs #782

Closed
alamb opened this issue Jul 27, 2021 · 4 comments
Closed

Wrong results when grouping on a column with NULLs #782

alamb opened this issue Jul 27, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Jul 27, 2021

Describe the bug
Given a column with null values, if you group by that column the null values are lost

To Reproduce
Input:

 c1 
----
  0
  3
   
  1
  3
(5 rows)

And run SELECT COUNT(*), c1 FROM test GROUP BY c1

Actual result

        "+-----------------+----+",
        "| COUNT(UInt8(1)) | c1 |",
        "+-----------------+----+",
        "| 2               | 3  |",
        "| 2               | 0  |", <-- NOTE there is only a single actual value of 0 in the input (the other is NULL)
        "| 1               | 1  |",
        "+-----------------+----+",

Expected behavior
Here is the correct answer according to postgres (note the row for c1 is NULL):

alamb=# SELECT COUNT(*), c1 FROM test GROUP BY c1;
 count | c1 
-------+----
     1 |   
     2 |  3
     1 |  0
     1 |  1
(4 rows)

Additional context
Discovered while playing around with #781

Here is the entire reproducer in postgres

alamb=# drop table test;
DROP TABLE
alamb=# create table test (c1 int);
CREATE TABLE
alamb=# insert into test values(0);
INSERT 0 1
alamb=# insert into test values(3);
INSERT 0 1
alamb=# insert into test values(NULL);
INSERT 0 1
alamb=# insert into test values(1);
INSERT 0 1
alamb=# insert into test values(3);
INSERT 0 1
alamb=# SELECT COUNT(*), c1 FROM test GROUP BY c1;
 count | c1 
-------+----
     1 |   
     2 |  3
     1 |  0
     1 |  1
(4 rows)
@alamb alamb added the bug Something isn't working label Jul 27, 2021
@alamb
Copy link
Contributor Author

alamb commented Jul 27, 2021

Actually, it is more insidious -- note that

        "+-----------------+----+",
        "| COUNT(UInt8(1)) | c1 |",
        "+-----------------+----+",
        "| 2               | 3  |",
        "| 2               | 0  |", <-- NOTE there is only a single actual value of 0 in the input (the other is NULL)
        "| 1               | 1  |",
        "+-----------------+----+",```

@alamb alamb changed the title Wrong results (missing rows) when grouping on a column with NULLs Wrong results when grouping on a column with NULLs Jul 27, 2021
@Dandandan
Copy link
Contributor

Great find. I think group by doesn't support nulls at all ATM.

The groupby scalars don't have a null and extracting a key doesn't look yet at whether the value is null or not.

@alamb
Copy link
Contributor Author

alamb commented Jul 27, 2021

@Dandandan I think you are right. I will write up a proposal of how to fix this shortly.

@alamb
Copy link
Contributor Author

alamb commented Aug 4, 2021

Resolved in #793 (though we are working on improved performance in #790)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants