Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in array_agg(distinct) query #10526

Merged
merged 1 commit into from
May 15, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #10486 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels May 15, 2024
@jayzhan211 jayzhan211 changed the title Support merge batch for distinct array aggregate function Fix panic in array_agg(distinct) query May 15, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review May 15, 2024 14:05
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick work @jayzhan211 -- very nice

states[0]
.as_list::<i32>()
.iter()
.flatten()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the flatten do? Discard Option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

3 1

# It has only AggregateExec with FinalPartitioned mode, so `merge_batch` is used
# If the plan is changed, whether the `merge_batch` is used should be verified to ensure the test coverage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for the explanatory comments about why this explain is here

@@ -198,6 +198,73 @@ statement error This feature is not implemented: LIMIT not supported in ARRAY_AG
SELECT array_agg(c13 LIMIT 1) FROM aggregate_test_100


# Test distinct aggregate function with merge batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test without the changes in this PR and it panic'd as expected:

cargo test --test sqllogictests
...
thread 'tokio-runtime-worker' panicked at datafusion/physical-expr/src/aggregate/array_agg_distinct.rs:158:9:
assertion `left == right` failed: state array should only include 1 row!
  left: 5
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@@ -153,12 +153,11 @@ impl Accumulator for DistinctArrayAggAccumulator {
return Ok(());
}

let array = &states[0];

assert_eq!(array.len(), 1, "state array should only include 1 row!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay that we just delete the assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because it is no longer true, this PR is dealing with the array that has len more than one

@jayzhan211
Copy link
Contributor Author

Thanks @alamb !

@jayzhan211 jayzhan211 merged commit 626c6bc into apache:main May 15, 2024
24 checks passed
@jayzhan211 jayzhan211 deleted the distinct-arrayagg branch May 18, 2024 07:19
glebpom pushed a commit to glebpom/datafusion that referenced this pull request May 22, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
DDtKey pushed a commit to DDtKey/arrow-datafusion that referenced this pull request May 22, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
(cherry picked from commit 626c6bc)
glebpom pushed a commit to glebpom/datafusion that referenced this pull request May 22, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
DDtKey pushed a commit to DDtKey/arrow-datafusion that referenced this pull request May 22, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
(cherry picked from commit 626c6bc)
maxburke pushed a commit to urbanlogiq/arrow-datafusion that referenced this pull request Jun 28, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Regression] Query using ARRAY_AGG(DISTINCT) causes panic
3 participants