Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_array does not properly support nulls #6887

Closed
alamb opened this issue Jul 7, 2023 · 5 comments · Fixed by #6900
Closed

make_array does not properly support nulls #6887

alamb opened this issue Jul 7, 2023 · 5 comments · Fixed by #6900
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Jul 7, 2023

Describe the bug

The make_array function doesn't appear to properly support nulls for string arrays

To Reproduce

When I run this script

create table foo as values ('foo', null), ('bar', null), (null, 'baz');
select * from foo;
select
  make_array(column1, column2)[1],
  make_array(column1, column2)[1] IS NULL
from foo;

select
  make_array(column1, column2)[2],
  make_array(column1, column2)[2] IS NULL
from foo;

I get the following output (note that the output of make_array that had a null input does not have a null output):

0 rows in set. Query took 0.004 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| foo     |         |
| bar     |         |
|         | baz     |
+---------+---------+
3 rows in set. Query took 0.002 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[1] | make_array(foo.column1,foo.column2)[1] IS NULL |
+----------------------------------------+------------------------------------------------+
| foo                                    | false                                          |
| bar                                    | false                                          |
|                                        | false                                          |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.007 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[2] | make_array(foo.column1,foo.column2)[2] IS NULL |
+----------------------------------------+------------------------------------------------+
|                                        | false                                          |
|                                        | false                                          |
| baz                                    | false                                          |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.007 seconds.

Expected behavior

This is what used to happen (note the true/false):

(arrow_dev) alamb@MacBook-Pro-8:~/Software/arrow-datafusion2/benchmarks$ datafusion-cli -f /tmp/repro.sql
DataFusion CLI v27.0.0
0 rows in set. Query took 0.002 seconds.
+---------+---------+
| column1 | column2 |
+---------+---------+
| foo     |         |
| bar     |         |
|         | baz     |
+---------+---------+
3 rows in set. Query took 0.001 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[1] | make_array(foo.column1,foo.column2)[1] IS NULL |
+----------------------------------------+------------------------------------------------+
| foo                                    | false                                          |
| bar                                    | false                                          |
|                                        | true                                           |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.002 seconds.
+----------------------------------------+------------------------------------------------+
| make_array(foo.column1,foo.column2)[2] | make_array(foo.column1,foo.column2)[2] IS NULL |
+----------------------------------------+------------------------------------------------+
|                                        | true                                           |
|                                        | true                                           |
| baz                                    | false                                          |
+----------------------------------------+------------------------------------------------+
3 rows in set. Query took 0.001 seconds.

Additional context

I found this while trying to update IOx to use the latest datafusion: https://github.com/influxdata/influxdb_iox/pull/8127

@alamb alamb added the bug Something isn't working label Jul 7, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 7, 2023

I believe this was introduced by #6662 - my tests pass with 4675216 but fail with 9edfcdc

@izveigor
Copy link
Contributor

izveigor commented Jul 7, 2023

@alamb Yes, you are right.
I'll take this issue again, the more I need to embed NullBuilder in make_array

@alamb
Copy link
Contributor Author

alamb commented Jul 9, 2023

I am going to try and fix this as an excuse to learn more of the array code directly

@alamb
Copy link
Contributor Author

alamb commented Jul 10, 2023

I have created #6900 with a proposed fix

@alamb
Copy link
Contributor Author

alamb commented Jul 10, 2023

@izveigor or @tustvold if you have time to review this PR I would be most appreciative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants