Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support empty array for array_union, array_intersect, and array_except #8181

Closed
jayzhan211 opened this issue Nov 15, 2023 · 4 comments · Fixed by #8269
Closed

Support empty array for array_union, array_intersect, and array_except #8181

jayzhan211 opened this issue Nov 15, 2023 · 4 comments · Fixed by #8269
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 15, 2023

Is your feature request related to a problem or challenge?

We cant deal with empty array for these three array function

Result

query ?
select array_union([], []);
----
NULL

query ?
select array_except([], []);
----
NULL

query error
select array_intersect([], []);
----
DataFusion error: Optimizer rule 'simplify_expressions' failed
caused by
Internal error: could not cast value to arrow_array::array::list_array::GenericListArray<i32>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Expected

query ?
select array_union([], []);
----
[]

query ?
select array_except([], []);
----
[]

query error
select array_intersect([], []);
----
[]

Describe the solution you'd like

Able to deal with empty array

Describe alternatives you've considered

None

Additional context

IIRC empty array has been an issue for long, we can't differentiate it from null array, so I think this issue is not only for these three array functions. I had traced this issue before but ends up nothing and work on other tasks, if anyone is interesting, welcome to pick it up.

@jayzhan211 jayzhan211 added the enhancement New feature or request label Nov 15, 2023
@Veeupup
Copy link
Contributor

Veeupup commented Nov 19, 2023

FYI. I have tried these functions locally, I find that these functions can handle empty arrays but not List(NullArray)

❯ select array_union(make_array([]), make_array([]));
+--------------------------------------------------------------------------------+
| array_union(make_array(List([NullArray(0)])),make_array(List([NullArray(0)]))) |
+--------------------------------------------------------------------------------+
| []                                                                             |
+--------------------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.select array_intersect(make_array([]), make_array([]));
+------------------------------------------------------------------------------------+
| array_intersect(make_array(List([NullArray(0)])),make_array(List([NullArray(0)]))) |
+------------------------------------------------------------------------------------+
| []                                                                                 |
+------------------------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.select array_except(make_array([]), make_array([]));
+---------------------------------------------------------------------------------+
| array_except(make_array(List([NullArray(0)])),make_array(List([NullArray(0)]))) |
+---------------------------------------------------------------------------------+
| []                                                                              |
+---------------------------------------------------------------------------------+
1 row in set. Query took 0.002 seconds.

but if we just use [], it will be converted to

❯ select [];
+----------------------+
| List([NullArray(0)]) |
+----------------------+
+----------------------+
0 rows in set. Query took 0.001 seconds.

In fact, we can not distinguish [] and null here

@Veeupup
Copy link
Contributor

Veeupup commented Nov 19, 2023

I'm trying to fix this

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Nov 19, 2023

If we can reuse make_array workflow for sql_array_literal, it would be great!
https://github.com/apache/arrow-datafusion/blob/325a3fbe7623d3df0ab64867545c4d93a0c96015/datafusion/sql/src/expr/value.rs#L128-L165

@Veeupup
Copy link
Contributor

Veeupup commented Nov 19, 2023

how about behaviors like below:

for array_union:

array_union([], []) = []
array_union([], null) = []
array_union(null, []) = []
array_union(null, null) = null

for array_except:

array_except([], []) = []
array_except([], null) = []
array_except(null, []) = null
array_except(null, null) = null

for array_intersect:

array_intersect([], []) = []
array_intersect([], null) = []
array_intersect(null, []) = []
array_intersect(null, null) = null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants