-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(expr): support array_cat
function and ||
operator for arrays
#5060
Conversation
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Codecov Report
@@ Coverage Diff @@
## main #5060 +/- ##
==========================================
+ Coverage 74.24% 74.26% +0.01%
==========================================
Files 882 884 +2
Lines 134937 135568 +631
==========================================
+ Hits 100182 100676 +494
- Misses 34755 34892 +137
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue is that this won't work yet for arrays with single quoted syntax e.g.
select '{1}' || array[array[2]];
select '{{1}}' || array[array[2]];
select '{{{1}}}' || array[array[2]];
But solving that is not a problem for now I guess. #5063 solved casting single quoted syntax arrays.
Rest LGTM!
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
.into() | ||
}; | ||
|
||
let return_type = match (&left_type, &right_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @xxchan has pointed out, ||
= array_append
+ array_prepend
+ array_cat
. And my suggestion is:
- In
FunctionCall::new
(already refactored toinfer_type_for_special
), we just require the two operands to have the same array type. This is thearray_cat
function from PostgreSQL. - In
bind_concat_op
, we decide whether the||
operator isConcatOp
for string, orarray_cat
for two arrays of the same type, orappend/prepend
possibly backed by rewriting toarray_cat
withwrap_single_elem
. Another example of such rewrite is fromconcat
toconcat_ws
. (Do not confuse these 2 withConcatOp
, they are 3 different things with proc oids 3058, 3059, 1258 inpg_proc
. dot)
Extra thoughts:
array_cat
actually support arrays of compatible types beyond same types, for examplearray_cat(array[1], array[1.5])
. This can be supported viaalign_types
(backed bycast_ok_array
). But feel free to defer this support if you find bugs during integration.- The match here have 2 arms for
append/prepend
each. It may be cleaner to wrap first and check whether they are the same (or compatible). It is just to assist resolving||
into one ofarray_cat
,array_append
orarray_prepend
with a small overhead in binder. We are still able to avoid the overhead of wrapping in backend by implementingarray_append
andarray_prepend
directly later if desired.
To summarize, the pseudo-code for bind_concat_op
could be:
if align_types(left, right).is_ok() {
array_cat(left, right)
} else if align_types(wrapped(left), right).is_ok {
array_prepend(left, right) or array_cat(wrapped(left), right)
} else if align_types(left, wrapped(right)).is_ok {
array_append(left, right) or array_cat(left, wrapped(right))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: array_cat
can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...
https://github.com/postgres/postgres/blob/86a4dc1e6f29d1992a2afa3fac1a0b0a6e84568c/src/backend/utils/adt/array_userfuncs.c#L212-L214
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
array_cat
can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...
What's more, array_prepend/append
only supports 1d array..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
array_cat
can also work between array and its element, but only when the element itself is an array and the other operand has one more dimension. Not sure why it was designed like this yet ...What's more,
array_prepend/append
only supports 1d array..
These are checked in the latest version of this PR🤪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the impl of PG array_cat
, it has branches to compute the result dimensions, but uses the same code to build the result array (just some memcpy). I guess this is feasible due to the memory layout of array. But I'm not sure whether array_append
an array as an element would be less efficient.
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Hi, everyone! I pushed some new commits to add select array_cat(array[123], null::int[][]); PG returns Other changes are made with the consideration to everyone's advice. Please feel free to add any further review comments.😃 |
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
{{66}} | ||
|
||
query T | ||
select array_cat(null::int[][], array[233]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select array_cat(array[123], null::int[][]);PG returns
array[123]
for the above SQL, however, we save element type inList
type and have to infer the return type in frontend, hence can't returnarray[123]
once we decide it'sarray_cat(int[], int[][])
.
Would returning array[array[123]]
a better choice?
IMO PG's behavior is quite strange, especially in this example:
create table t(x int[], y int[][]);
insert into t values (Array[1], Array[Array[2]]), (Array[1], NULL);
select x, y, array_cat(x,y) from t;
x | y | array_cat
-----+-------+-----------
{1} | {{2}} | {{1},{2}}
{1} | | {1}
(2 rows)
(Off topic) Other issues about PG array #3811
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mentioned by @richardchien off-line): PG do not allow multi dimentional array with mismatching dimensions, this is related to #3811.
For array_cat
:
select array_cat(Array[1], Array[Array[2,3]]);
ERROR: cannot concatenate incompatible arrays
DETAIL: Arrays with differing dimensions are not compatible for concatenation.
select array_cat(Array[Array[1]], Array[Array[2,3]]);
ERROR: cannot concatenate incompatible arrays
DETAIL: Arrays with differing element dimensions are not compatible for concatenation.
we do lack a clear definition/expectation of what we support.
😇
BTW, maybe add such cases to the test cases to show our current behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things can be more funny: 😄
select x, y, array_cat(x,y) from t;
x | y | array_cat
---------+-------+---------------
{1} | {{2}} | {{1},{2}}
{1} | | {1}
{{1}} | {{1}} | {{1},{1}}
{{{1}}} | {{1}} | {{{1}},{{1}}}
(4 rows)
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
A quick summary for things that should implement later:
|
Signed-off-by: Richard Chien <stdrc@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest lgtm.
Regarding the strange behaviors, it is mostly due to PG only having int
and int[]
types in frontend. As a result,
null::int[][]
is treated asnull::int[]
, soarray_cat(null::int[][], array[1])
becomesarray_cat(null::int[], array[1])
, which evaluates to{1}
. It even allowsarray_cat(null::int[][][][][], array[1])
orarray_cat(array[]::int[][][][][], array[1])
, but reports error when the higher dimension array is not empty.array_cat
accepts(int[], int[])
,array_append
accepts(int[], int)
, andarray_prepend
accepts(int, int[])
. This is why it had to separateappend_value
fromappend_array
but combinesconcat_array
+append_array
+prepend_array
together. The functionality is just grouped by its immature signatures.
In my opinion the semantic implemented in this PR is good enough now. It is cleanly organized and mimics PG in most cases. Once we are aligned on how we want our array type to follow or diverge from PG, it is easy to adjust from current status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
} | ||
let func_type = match (left.return_type(), right.return_type()) { | ||
// string concatenation | ||
(DataType::Varchar, _) | (_, DataType::Varchar) => ExprType::ConcatOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we use separate expression types for arrays, I think it's okay to name this more specific like StringConcatOp
. 😄
left.values_ref() | ||
.into_iter() | ||
.chain(std::iter::once(right)) | ||
.map(|x| x.map(ScalarRefImpl::into_scalar_impl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support storing an iterator of DatumRef
in the ListRef
, then we can use builder.append_datum_ref
to acheive zero-copy. 🤣 But it may require some hacks.
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
array_cat
,array_append
andarray_prepend
functions||
with at least one array operand intoarray_cat
/array_append
/array_prepend
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Types of user-facing changes
Release note
array_cat
,array_append
andarray_prepend
functions||
Refer to a related PR or issue link (optional)
#4998