Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support SQL array replacement and removement functions #7057

Merged
merged 13 commits into from
Jul 27, 2023
Merged

feat: support SQL array replacement and removement functions #7057

merged 13 commits into from
Jul 27, 2023

Conversation

izveigor
Copy link
Contributor

@izveigor izveigor commented Jul 22, 2023

Which issue does this PR close?

Closes #7072
Closes #7073
Closes #7074
Closes #7075
Closes #7092
Closes #7093

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@izveigor izveigor marked this pull request as draft July 22, 2023 19:29
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 22, 2023
ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
ColumnarValue::Array(arr) => arr.clone(),
};
pub fn array_replace(args: &[ArrayRef]) -> Result<ArrayRef> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why array_replace and array_remove are needed:

I think it will be easier for users to modify arrays using the full table of values (rather than looking up indexes):

Quantity Search Fill Replacement Remove
Uniqueness array_position(array, [i]) array_resize(x, 1) array_replace(array, from, to) array_remove(array, element)
Plurality array_positions(array)[:i] array_resize(x, n) array_replace(array, from, to, max) array_remove(array, element, max)
Wholeness array_positions(array) array_resize(x, array_length(array)) array_replaces(array, from, to) array_removes(array, element)

Example:

Example with stones

Let's say in some country there is a vote for the approval of some reform. Every citizen can vote for and against. The color of the stones will be used as the subject of the vote, namely black (against) and white (for). From the point of view of the array, this reconstruction will look like this, we have an array consisting of 0 (black color) and 1 (white color). However, a situation may arise the elections. For example, 2 white balls and 3 black balls were dishonestly counted in the vote. Having noticed the falsification, the state decides to remove or replace these stones with NULL.

Now let's try to solve these problems using different databases:

Initial list: [1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0]

My decision

# Remove
SELECT array_remove(array_remove([1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0], 1, 3), 0, 2);
----
[1, 1, 1, 0, 1, 1, 0]

# Replace
SELECT array_replace(array_replace([1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0], 1, NULL, 3), 0, NULL, 2);
----
[NULL, NULL, NULL, 1, 1, 1, NULL, NULL, 0, 1, 1, 0]

Other databases

There are no solutions using Turing incomplete declarative language (classic SQL) for databases like DuckDB, PostgreSQL.

StackOverFlow:

What do you think of these features, @alamb and @jayzhan211? Should they be implemented or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think array_replace(array, from, to, max) should be array_replaces since not replace only one. array_remove(array, element, max) should be array_remove(array, elements, max) since we might remove elements, the same as array_removes(array, elements)

Copy link
Contributor

@jayzhan211 jayzhan211 Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace and remove are helpful functions that are nice to have.

@izveigor
Copy link
Contributor Author

Ready for the review, @alamb and @jayzhan211!

@izveigor izveigor marked this pull request as ready for review July 24, 2023 18:30
@alamb
Copy link
Contributor

alamb commented Jul 24, 2023

I plan to review this carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @izveigor -- this is looking really nice 👌

I have some suggestions for the API:

  1. use _all instead of s to replace/remove all matches
  2. make a _n version rather than an optional parameter for replace/remove N

All in all, very nice work here. Thank you

| array_positions(array, element) | Searches for an element in the array, returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` |
| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` |
| array_removes(array, element) | Removes all elements from the array equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1, 4]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there is a function named array_has_all that matches only if all arguments are present, it seems to me that more consistent names would be:

  • array_removes --> array_remove_all
  • array_replaces --> array_replace_all

| array_position(array, element) | Searches for an element in the array, returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` |
| array_positions(array, element) | Searches for an element in the array, returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` |
| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could explicitly say "Removes the first element from the array equal to the given value":

Suggested change
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` |
| array_remove(array, element) | Removes the first element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` |

| array_prepend(array, element) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` |
| array_remove(array, element) | Removes one element from the array equal to the given value. `array_remove([1, 2, 2, 3, 1, 4], 2) -> [1, 2, 3, 1, 4]` |
| array_removes(array, element) | Removes all elements from the array equal to the given value. `array_removes([1, 2, 2, 3, 1, 4], 2) -> [1, 3, 1, 4]` |
| array_replace(array, from, to) | Replaces one occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| array_replace(array, from, to) | Replaces one occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` |
| array_replace(array, from, to) | Replaces the first occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 1, 4], 2, 5) -> [1, 5, 2, 3, 1, 4]` |

@@ -1777,24 +1779,72 @@ _Alias of [array_prepend](#array_prepend)._

### `array_remove`

Removes all elements equal to the given value from the array.
Removes one element from the array equal to the given value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Removes one element from the array equal to the given value.
Removes the first element from the array equal to the given value.


# array_replace scalar function #2 (with optional argument)
query ???
select array_replace(make_array(1, 2, 3, 4), 2, 3, 2), array_replace(make_array(1, 4, 4, 5, 4, 6, 7), 4, 0, 2), array_replace(make_array(1, 2, 3), 4, 0, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of these features, @alamb and @jayzhan211? Should they be implemented or not?

I found the fact that array_replace has 4 arguments very confusing initially. All "replace" APIs I can think of (e.g. Rust's replace) take only three arguments: the value, the pattern to search for and the replacement value.

If we want to implement "replace the first N" values, I think we should add a new function like array_replace_n (modeled after replacen)

It would be great to share the implementation as you have done in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Thanks for the great idea, @alamb!

as_list_array(&array).expect("failed to initialize function array_remove");

assert_eq!(result.len(), 1);
let data = vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should really be testing with NULLs as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will deal with this issue later (in a separate PR)

@izveigor izveigor changed the title feat: support SQL functions: array_replace, array_replaces, array_remove and array_removes feat: support SQL array replacement and removement functions Jul 25, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor

@alamb alamb merged commit bd1d82f into apache:main Jul 27, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants