Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ArrayAgg schema mismatch issue #8055

Merged
merged 10 commits into from
Nov 9, 2023

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Nov 5, 2023

Which issue does this PR close?

Closes #8032.

Rationale for this change

The main issue for #8032 is as what the errors said we have different schema for result.schema which is logical-expr schema and results.collect()[0].schema which is physical-expr schema.

We should have the same is_nullable for both logical-expr and physical-expr

We can see that the logical-expr of List is set with expr.nullable(schema)
https://github.com/apache/arrow-datafusion/blob/3469c4e09a3d32381949dd0c0f626f406c00c6ad/datafusion/expr/src/expr_schema.rs#L289-L305
and logical-expr of List element is set with the return type of Aggregatefunciton::ArrayAgg
https://github.com/apache/arrow-datafusion/blob/3469c4e09a3d32381949dd0c0f626f406c00c6ad/datafusion/expr/src/aggregate_function.rs#L286-L290

What changes are included in this PR?

Fix schema is_nullable

Are these changes tested?

I did not test end to end like #8032 , just make sure the schema is the same.

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Nov 5, 2023
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as draft November 5, 2023 03:56
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review November 5, 2023 07:56
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- it would be nice if there was some way to avoid having to plumb the is_expr_nullable flag through, but I don't think there is

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 for the contribution and working on this area of the code

datafusion/physical-expr/src/aggregate/array_agg.rs Outdated Show resolved Hide resolved
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

array_agg with pyarrow errors with ArrowInvalid: Schema at index 0 was different
2 participants