-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support SQL array replacement and removement functions #7057
Conversation
ColumnarValue::Scalar(scalar) => scalar.to_array().clone(), | ||
ColumnarValue::Array(arr) => arr.clone(), | ||
}; | ||
pub fn array_replace(args: &[ArrayRef]) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why array_replace
and array_remove
are needed:
I think it will be easier for users to modify arrays using the full table of values (rather than looking up indexes):
Quantity | Search | Fill | Replacement | Remove |
---|---|---|---|---|
Uniqueness | array_position(array, [i]) |
array_resize(x, 1) |
array_replace(array, from, to) |
array_remove(array, element) |
Plurality | array_positions(array)[:i] |
array_resize(x, n) |
array_replace(array, from, to, max) |
array_remove(array, element, max) |
Wholeness | array_positions(array) |
array_resize(x, array_length(array)) |
array_replaces(array, from, to) |
array_removes(array, element) |
Example:
Example with stones
Let's say in some country there is a vote for the approval of some reform. Every citizen can vote for and against. The color of the stones will be used as the subject of the vote, namely black (against) and white (for). From the point of view of the array, this reconstruction will look like this, we have an array consisting of 0 (black color) and 1 (white color). However, a situation may arise the elections. For example, 2 white balls and 3 black balls were dishonestly counted in the vote. Having noticed the falsification, the state decides to remove or replace these stones with NULL.
Now let's try to solve these problems using different databases:
Initial list: [1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0]
My decision
# Remove
SELECT array_remove(array_remove([1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0], 1, 3), 0, 2);
----
[1, 1, 1, 0, 1, 1, 0]
# Replace
SELECT array_replace(array_replace([1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0], 1, NULL, 3), 0, NULL, 2);
----
[NULL, NULL, NULL, 1, 1, 1, NULL, NULL, 0, 1, 1, 0]
Other databases
There are no solutions using Turing incomplete declarative language (classic SQL) for databases like DuckDB, PostgreSQL.
StackOverFlow:
- https://stackoverflow.com/questions/54992228/remove-one-non-unique-value-from-an-array
- https://stackoverflow.com/questions/59734581/how-to-remove-specific-value-from-array
- https://stackoverflow.com/questions/58535121/how-to-remove-first-occurence-of-an-element-in-array-in-clickhouse
- https://stackoverflow.com/questions/63287790/set-array-value-at-specific-index-in-clickhouse
What do you think of these features, @alamb and @jayzhan211? Should they be implemented or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think array_replace(array, from, to, max)
should be array_replaces
since not replace only one. array_remove(array, element, max)
should be array_remove(array, elements, max)
since we might remove elements, the same as array_removes(array, elements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace
and remove
are helpful functions that are nice to have.
Ready for the review, @alamb and @jayzhan211! |
I plan to review this carefully tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @izveigor -- this is looking really nice 👌
I have some suggestions for the API:
- use
_all
instead ofs
to replace/remove all matches - make a
_n
version rather than an optional parameter for replace/remove N
All in all, very nice work here. Thank you
| array_positions(array, element) | Searches for an element in the array, returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` | | ||
| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` | | ||
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` | | ||
| array_removes(array, element) | Removes all elements from the array equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1, 4]` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given there is a function named array_has_all
that matches only if all arguments are present, it seems to me that more consistent names would be:
array_removes
-->array_remove_all
array_replaces
-->array_replace_all
| array_position(array, element) | Searches for an element in the array, returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` | | ||
| array_positions(array, element) | Searches for an element in the array, returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` | | ||
| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` | | ||
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could explicitly say "Removes the first element from the array equal to the given value":
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` | | |
| array_remove(array, element) | Removes the first element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` | |
| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` | | ||
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` | | ||
| array_removes(array, element) | Removes all elements from the array equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1, 4]` | | ||
| array_replace(array, from, to) | Replaces one occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| array_replace(array, from, to) | Replaces one occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` | | |
| array_replace(array, from, to) | Replaces the first occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` | |
@@ -1777,24 +1779,72 @@ _Alias of [array_prepend](#array_prepend)._ | |||
|
|||
### `array_remove` | |||
|
|||
Removes all elements equal to the given value from the array. | |||
Removes one element from the array equal to the given value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes one element from the array equal to the given value. | |
Removes the first element from the array equal to the given value. |
|
||
# array_replace scalar function #2 (with optional argument) | ||
query ??? | ||
select array_replace(make_array(1, 2, 3, 4), 2, 3, 2), array_replace(make_array(1, 4, 4, 5, 4, 6, 7), 4, 0, 2), array_replace(make_array(1, 2, 3), 4, 0, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of these features, @alamb and @jayzhan211? Should they be implemented or not?
I found the fact that array_replace
has 4 arguments very confusing initially. All "replace" APIs I can think of (e.g. Rust's replace
) take only three arguments: the value, the pattern to search for and the replacement value.
If we want to implement "replace the first N" values, I think we should add a new function like array_replace_n
(modeled after replacen
)
It would be great to share the implementation as you have done in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Thanks for the great idea, @alamb!
as_list_array(&array).expect("failed to initialize function array_remove"); | ||
|
||
assert_eq!(result.len(), 1); | ||
let data = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should really be testing with NULLs as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will deal with this issue later (in a separate PR)
array_replace
, array_replaces
, array_remove
and array_removes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @izveigor
Which issue does this PR close?
Closes #7072
Closes #7073
Closes #7074
Closes #7075
Closes #7092
Closes #7093
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
Yes