Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(expr): support array_cat function and || operator for arrays #5060

Merged
merged 25 commits into from
Sep 5, 2022

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Sep 1, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Add array_cat, array_append and array_prepend functions
  • Convert || with at least one array operand into array_cat/array_append/array_prepend

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

Types of user-facing changes

  • SQL commands, functions, and operators

Release note

  • Support array_cat, array_append and array_prepend functions
  • Support array concatenation using ||

Refer to a related PR or issue link (optional)

#4998

stdrc added 6 commits August 31, 2022 16:34
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
@codecov
Copy link

codecov bot commented Sep 1, 2022

Codecov Report

Merging #5060 (6571eea) into main (e1cacf0) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #5060      +/-   ##
==========================================
+ Coverage   74.24%   74.26%   +0.01%     
==========================================
  Files         882      884       +2     
  Lines      134937   135568     +631     
==========================================
+ Hits       100182   100676     +494     
- Misses      34755    34892     +137     
Flag Coverage Δ
rust 74.26% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/error.rs 0.00% <ø> (ø)
src/batch/src/executor/generic_exchange.rs 41.52% <ø> (+0.14%) ⬆️
src/batch/src/executor/join/lookup_join.rs 57.28% <ø> (+1.12%) ⬆️
src/batch/src/executor/monitor/stats.rs 66.01% <ø> (-8.99%) ⬇️
src/batch/src/executor/row_seq_scan.rs 17.67% <ø> (+0.66%) ⬆️
src/compute/src/server.rs 0.00% <ø> (ø)
src/expr/src/expr/expr_array_concat.rs 71.55% <ø> (ø)
src/expr/src/expr/mod.rs 48.93% <ø> (-1.07%) ⬇️
src/frontend/src/binder/expr/binary_op.rs 66.07% <ø> (-5.12%) ⬇️
src/frontend/src/binder/expr/function.rs 84.61% <ø> (+0.14%) ⬆️
... and 53 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@nanderstabel nanderstabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue is that this won't work yet for arrays with single quoted syntax e.g.

select '{1}' || array[array[2]];
select '{{1}}' || array[array[2]];
select '{{{1}}}' || array[array[2]];

But solving that is not a problem for now I guess. #5063 solved casting single quoted syntax arrays.

Rest LGTM!

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
.into()
};

let return_type = match (&left_type, &right_type) {
Copy link
Contributor

@xiangjinwu xiangjinwu Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @xxchan has pointed out, || = array_append + array_prepend + array_cat. And my suggestion is:

  • In FunctionCall::new (already refactored to infer_type_for_special), we just require the two operands to have the same array type. This is the array_cat function from PostgreSQL.
  • In bind_concat_op, we decide whether the || operator is ConcatOp for string, or array_cat for two arrays of the same type, or append/prepend possibly backed by rewriting to array_cat with wrap_single_elem. Another example of such rewrite is from concat to concat_ws. (Do not confuse these 2 with ConcatOp, they are 3 different things with proc oids 3058, 3059, 1258 in pg_proc. dot)

Extra thoughts:

  • array_cat actually support arrays of compatible types beyond same types, for example array_cat(array[1], array[1.5]). This can be supported via align_types (backed by cast_ok_array). But feel free to defer this support if you find bugs during integration.
  • The match here have 2 arms for append/prepend each. It may be cleaner to wrap first and check whether they are the same (or compatible). It is just to assist resolving || into one of array_cat, array_append or array_prepend with a small overhead in binder. We are still able to avoid the overhead of wrapping in backend by implementing array_append and array_prepend directly later if desired.

To summarize, the pseudo-code for bind_concat_op could be:

if align_types(left, right).is_ok() {
  array_cat(left, right)
} else if align_types(wrapped(left), right).is_ok {
  array_prepend(left, right) or array_cat(wrapped(left), right)
} else if align_types(left, wrapped(right)).is_ok {
  array_append(left, right) or array_cat(left, wrapped(right))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: array_cat can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...
https://github.com/postgres/postgres/blob/86a4dc1e6f29d1992a2afa3fac1a0b0a6e84568c/src/backend/utils/adt/array_userfuncs.c#L212-L214

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: array_cat can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...

What's more, array_prepend/append only supports 1d array..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: array_cat can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...

What's more, array_prepend/append only supports 1d array..

These are checked in the latest version of this PR🤪

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the impl of PG array_cat, it has branches to compute the result dimensions, but uses the same code to build the result array (just some memcpy). I guess this is feasible due to the memory layout of array. But I'm not sure whether array_append an array as an element would be less efficient.

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
@stdrc
Copy link
Member Author

stdrc commented Sep 2, 2022

Hi, everyone! I pushed some new commits to add array_prepend and array_append. Now our behavior is very similar to PG, but still has some differences that can't be fixed. One example is:

select array_cat(array[123], null::int[][]);

PG returns array[123] for the above SQL, however, we save element type in List type and have to infer the return type in frontend, hence can't return array[123] once we decide it's array_cat(int[], int[][]).

Other changes are made with the consideration to everyone's advice. Please feel free to add any further review comments.😃

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
{{66}}

query T
select array_cat(null::int[][], array[233]);
Copy link
Member

@xxchan xxchan Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select array_cat(array[123], null::int[][]);

PG returns array[123] for the above SQL, however, we save element type in List type and have to infer the return type in frontend, hence can't return array[123] once we decide it's array_cat(int[], int[][]).

Would returning array[array[123]] a better choice?

IMO PG's behavior is quite strange, especially in this example:

create table t(x int[], y int[][]);
insert into t values (Array[1], Array[Array[2]]), (Array[1], NULL);
select x, y, array_cat(x,y) from t;
  x  |   y   | array_cat
-----+-------+-----------
 {1} | {{2}} | {{1},{2}}
 {1} |       | {1}
(2 rows)

(Off topic) Other issues about PG array #3811

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mentioned by @richardchien off-line): PG do not allow multi dimentional array with mismatching dimensions, this is related to #3811.

For array_cat:

select array_cat(Array[1], Array[Array[2,3]]);
ERROR:  cannot concatenate incompatible arrays
DETAIL:  Arrays with differing dimensions are not compatible for concatenation.

select array_cat(Array[Array[1]], Array[Array[2,3]]);
ERROR:  cannot concatenate incompatible arrays
DETAIL:  Arrays with differing element dimensions are not compatible for concatenation.

we do lack a clear definition/expectation of what we support.

😇

BTW, maybe add such cases to the test cases to show our current behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things can be more funny: 😄

select x, y, array_cat(x,y) from t;
    x    |   y   |   array_cat
---------+-------+---------------
 {1}     | {{2}} | {{1},{2}}
 {1}     |       | {1}
 {{1}}   | {{1}} | {{1},{1}}
 {{{1}}} | {{1}} | {{{1}},{{1}}}
(4 rows)

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
@stdrc
Copy link
Member Author

stdrc commented Sep 5, 2022

A quick summary for things that should implement later:

  • Check the dimension when concatenate multidimensional arrays. link to comment
  • Allow concatenating arrays with compatible types, e.g. array[1.1] || array[1]. link to comment
  • Decide whether we should have exact the same strange behavior as PG. link to comment

Signed-off-by: Richard Chien <stdrc@outlook.com>
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest lgtm.

Regarding the strange behaviors, it is mostly due to PG only having int and int[] types in frontend. As a result,

  • null::int[][] is treated as null::int[], so array_cat(null::int[][], array[1]) becomes array_cat(null::int[], array[1]), which evaluates to {1}. It even allows array_cat(null::int[][][][][], array[1]) or array_cat(array[]::int[][][][][], array[1]), but reports error when the higher dimension array is not empty.
  • array_cat accepts (int[], int[]), array_append accepts (int[], int), and array_prepend accepts (int, int[]). This is why it had to separate append_value from append_array but combines concat_array+append_array+prepend_array together. The functionality is just grouped by its immature signatures.

In my opinion the semantic implemented in this PR is good enough now. It is cleanly organized and mimics PG in most cases. Once we are aligned on how we want our array type to follow or diverge from PG, it is easy to adjust from current status.

src/frontend/src/binder/expr/binary_op.rs Outdated Show resolved Hide resolved
src/frontend/src/binder/expr/function.rs Show resolved Hide resolved
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

}
let func_type = match (left.return_type(), right.return_type()) {
// string concatenation
(DataType::Varchar, _) | (_, DataType::Varchar) => ExprType::ConcatOp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we use separate expression types for arrays, I think it's okay to name this more specific like StringConcatOp. 😄

src/expr/src/expr/expr_array_concat.rs Outdated Show resolved Hide resolved
left.values_ref()
.into_iter()
.chain(std::iter::once(right))
.map(|x| x.map(ScalarRefImpl::into_scalar_impl))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we support storing an iterator of DatumRef in the ListRef, then we can use builder.append_datum_ref to acheive zero-copy. 🤣 But it may require some hacks.

Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
@mergify mergify bot merged commit 52b259a into main Sep 5, 2022
@mergify mergify bot deleted the rc/array-cat branch September 5, 2022 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants