Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different array behavior with PG's multidimensional array #3811

Open
xxchan opened this issue Jul 12, 2022 · 5 comments
Open

different array behavior with PG's multidimensional array #3811

xxchan opened this issue Jul 12, 2022 · 5 comments
Labels
type/bug Something isn't working

Comments

@xxchan
Copy link
Member

xxchan commented Jul 12, 2022

Describe the bug

In risingwave

SELECT * FROM unnest(Array[Array[1,3,4,5],Array[2,3]]);
 unnest
--------
      1
      3
      4
      5
      2
      3

CREATE TABLE t(a int[][]);
insert into t values (Array[Array[1,2], Array[1]]);
select * from t;
      a
-------------
 {{1,2},{1}}

In PG, this isn't allowed

SELECT * FROM unnest(Array[Array[1,3,4,5],Array[2,3]]);
ERROR:  multidimensional arrays must have array expressions with matching dimensions

CREATE TABLE t(a int[][]);
insert into t values (Array[Array[1,2], Array[1]]);
ERROR:  multidimensional arrays must have array expressions with matching dimensions

Expected behavior
Same as PG?

cc @neverchanje

@xxchan xxchan added the type/bug Something isn't working label Jul 12, 2022
@xiangjinwu
Copy link
Contributor

I guess it was supposed to tolerate json values like [[1,3,4,5],[2,3]]? But json also allows [1,'a'] and we have talked about not supporting it.

BTW PG is not strict on some other aspects either:

test=# create table arr (v1 int[]);
CREATE TABLE
test=# insert into arr values (array[1,2]), (array[array[9]]);
INSERT 0 2
test=# select * from arr;
  v1   
-------
 {1,2}
 {{9}}
(2 rows)

test=# select array_agg(v1) from arr;
ERROR:  cannot accumulate arrays of different dimensionality

We may not follow PG exactly for all its array design, but we do lack a clear definition/expectation of what we support.

@stdrc
Copy link
Member

stdrc commented Sep 5, 2022

Here are more cases:

# constructing
select array[array[1,2], array[3]];

# concatenating
select array[array[1,2]] || array[array[3]];

# batch aggregating
select array_agg(a) from (values (array[array[1,2]]), (array[array[3]])) t(a);

# streaming aggregating
create table t (a int[][]);
insert into t values (array[array[1,2]]), (array[array[3]]);
create materialized view mv as select array_agg(a) as agg from t;
select * from mv;

All these fail on PG but succeed on RW.

@stdrc stdrc changed the title multidimensional arrays can have mismatching dimentions multidimensional arrays can have mismatching dimensions Sep 5, 2022
@xxchan xxchan changed the title multidimensional arrays can have mismatching dimensions different array behavior with PG's multidimensional array Mar 21, 2023
@xxchan
Copy link
Member Author

xxchan commented Mar 22, 2023

Notes about PG's array: https://www.postgresql.org/docs/current/arrays.html

However, the current implementation ignores any supplied array size limits, i.e., the behavior is the same as for arrays of unspecified length.

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

So basically each array value is an ND array. But there are no dimension restrictions among the array values in the same column.

@xiangjinwu
Copy link
Contributor

They are essentially 2 different types: nested list vs tensor, and can be different in several ways:

  • whether array[array[1, 2], array[1]] is an invalid value
  • whether array[array[1]]'s type is int[] (rather than int[][])
  • whether unnest(array[array[1]]) returns 1 (rather than array[1])
  • whether the starting index of a dimension can be 7 or -2 (rather than always 1 or 0)

Based on usages we have seen so far, our expected behavior would be:

  • Allow jagged value.
    • Pros: jagged array is also found in Avro, Arrow, Flink, DuckDB
    • Cons: the following PostgreSQL functions would run slower when inspecting inner dimensions: array_dims, array_length, array_upper, generate_subscripts. Or we can reject multi-dimensional input first.
  • Use type int[][]
    • Pros: intuitive, and allows array_append(array[array[1]], array[2])
    • Cons: disallows PostgreSQL expression '{{1,2,3},{4,5,6},{7,8,9}}'::int[], and there would be infinite types in the type system (int[], int[][], int[][][], ...)
  • Keep unnest(array[array[1]]) to return 1, and cardinality(array[array[1, 2]]) to return 2
    • Pros: PostgreSQL compatible. A working query ported from PostgreSQL won't return a different result silently.
    • Cons: Needs recursion. unnest(array[array[1]]) = array[1] in DuckDB, and cardinality(array[array[1, 2]]) = 1 in Flink.
  • Do not support customizing dimension lower bound
    • Pros: simple
    • Cons: disallows PostgreSQL expression '[2:4]={1,2,3}'::int[]

@xxchan
Copy link
Member Author

xxchan commented Jun 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants