Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column support for array_to_string #6940

Merged
merged 4 commits into from
Jul 16, 2023

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Ref #6804

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Note

Revert array.slt changed by #6595

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 13, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 13, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review July 13, 2023 11:33
@izveigor
Copy link
Contributor

izveigor commented Jul 14, 2023

I think this is the last function that needs column support.
The remaining functions will already be carried out in accordance with the new concept: #6855
Thank you for taking on this issue, @jayzhan211!
P.S. Tomorrow I will create new tickets according to the new concept.
cc @alamb

query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\)
caused by
Error during planning: Cannot automatically convert List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) to List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\)
query ???
select array_fill(11, make_array(1, 2, 3)), array_fill(3, make_array(2, 3)), array_fill(2, make_array(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_fill will not be used in future versions (due to the inability to properly set the data type)
It will be replaced with array_repeat.

query error DataFusion error: SQL error: ParserError\("Expected an SQL statement, found: caused"\)
caused by
Error during planning: Cannot automatically convert List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\) to List\(Field \{ name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: \{\} \}\)
query ??
select trim_array([[1, 2], [3, 4], [5, 6]], 2), trim_array(array_fill(4, [3, 4, 2]), 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim_array too (See: #6855 (reply in thread)) (it will be replaced with array_slice).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 and @izveigor

I think there are some improvements / issues with the code in this PR. However, I think it is a step forward for datafusion and I think we can merge it and keep iterating on master.

I defer to your preference. Let me know what you want to do

Comment on lines +1239 to +1240
let s = s.strip_suffix(delimeter).unwrap().to_string();
res.push(Some(s));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure if it is sure that delimiter is always present. If it is not, this code will panic.

Perhaps it could be something like the following to aavoid the unwrap?

Suggested change
let s = s.strip_suffix(delimeter).unwrap().to_string();
res.push(Some(s));
let s = s.strip_suffix(delimeter).unwrap_or(s).to_string();
res.push(Some(s));


match arr.data_type() {
DataType::List(_) | DataType::LargeList(_) | DataType::FixedSizeList(_, _) => {
let list_array = arr.as_list::<i32>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work (will panic) for LargeList as it should be i64 -- we would probably have to make a generic version of this function.

However, for now I think it looks reasonable and a step forward to me

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jul 14, 2023 via email

Copy link
Contributor

@izveigor izveigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any serious mistakes here and I think we can merge.
However, the final word in this PR can only be given by @alamb.

@alamb
Copy link
Contributor

alamb commented Jul 16, 2023

Thanks @izveigor and @jayzhan211

@alamb alamb merged commit 49583bd into apache:main Jul 16, 2023
@izveigor izveigor mentioned this pull request Jul 16, 2023
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants