-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column support for array_to_string #6940
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
I think this is the last function that needs column support. |
query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\) | ||
caused by | ||
Error during planning: Cannot automatically convert List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) to List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) | ||
query ??? | ||
select array_fill(11, make_array(1, 2, 3)), array_fill(3, make_array(2, 3)), array_fill(2, make_array(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_fill
will not be used in future versions (due to the inability to properly set the data type)
It will be replaced with array_repeat
.
query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\) | ||
caused by | ||
Error during planning: Cannot automatically convert List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) to List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) | ||
query ?? | ||
select trim_array([[1, 2], [3, 4], [5, 6]], 2), trim_array(array_fill(4, [3, 4, 2]), 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trim_array
too (See: #6855 (reply in thread)) (it will be replaced with array_slice
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 and @izveigor
I think there are some improvements / issues with the code in this PR. However, I think it is a step forward for datafusion and I think we can merge it and keep iterating on master.
I defer to your preference. Let me know what you want to do
let s = s.strip_suffix(delimeter).unwrap().to_string(); | ||
res.push(Some(s)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure if it is sure that delimiter
is always present. If it is not, this code will panic.
Perhaps it could be something like the following to aavoid the unwrap?
let s = s.strip_suffix(delimeter).unwrap().to_string(); | |
res.push(Some(s)); | |
let s = s.strip_suffix(delimeter).unwrap_or(s).to_string(); | |
res.push(Some(s)); |
|
||
match arr.data_type() { | ||
DataType::List(_) | DataType::LargeList(_) | DataType::FixedSizeList(_, _) => { | ||
let list_array = arr.as_list::<i32>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is going to work (will panic
) for LargeList
as it should be i64
-- we would probably have to make a generic version of this function.
However, for now I think it looks reasonable and a step forward to me
I think those are good suggestions but as you said we could improve them
iteratively. I think we can merge this PR for now! 🙂
…On Sat, Jul 15, 2023, 4:58 AM Andrew Lamb ***@***.***> wrote:
***@***.**** approved this pull request.
Thank you @jayzhan211 <https://github.com/jayzhan211> and @izveigor
<https://github.com/izveigor>
I think there are some improvements / issues with the code in this PR.
However, I think it is a step forward for datafusion and I think we can
merge it and keep iterating on master.
I defer to your preference. Let me know what you want to do
------------------------------
In datafusion/physical-expr/src/array_expressions.rs
<#6940 (comment)>
:
> + let s = s.strip_suffix(delimeter).unwrap().to_string();
+ res.push(Some(s));
I am not quite sure if it is sure that delimiter is always present. If it
is not, this code will panic.
Perhaps it could be something like the following to aavoid the unwrap?
⬇️ Suggested change
- let s = s.strip_suffix(delimeter).unwrap().to_string();
- res.push(Some(s));
+ let s = s.strip_suffix(delimeter).unwrap_or(s).to_string();
+ res.push(Some(s));
------------------------------
In datafusion/physical-expr/src/array_expressions.rs
<#6940 (comment)>
:
> @@ -1195,21 +1195,56 @@ pub fn array_to_string(args: &[ArrayRef]) -> Result<ArrayRef> {
}
let mut arg = String::from("");
- let mut res = compute_array_to_string(
- &mut arg,
- arr.clone(),
- delimeter.clone(),
- null_string,
- with_null_string,
- )?
- .clone();
- match res.as_str() {
- "" => Ok(Arc::new(StringArray::from(vec![Some(res)]))),
+ let mut res: Vec<Option<String>> = Vec::new();
+
+ match arr.data_type() {
+ DataType::List(_) | DataType::LargeList(_) | DataType::FixedSizeList(_, _) => {
+ let list_array = arr.as_list::<i32>();
I don't think this is going to work (will panic) for LargeList as it
should be i64 -- we would probably have to make a generic version of this
function.
However, for now I think it looks reasonable and a step forward to me
—
Reply to this email directly, view it on GitHub
<#6940 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZCLRZDJ4OSGW3BQA6Y27TXQGXITANCNFSM6AAAAAA2IIUBZI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any serious mistakes here and I think we can merge.
However, the final word in this PR can only be given by @alamb.
Thanks @izveigor and @jayzhan211 |
Which issue does this PR close?
Ref #6804
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Note
Revert array.slt changed by #6595