Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Query using ARRAY_AGG(DISTINCT) causes panic #10486

Closed
Tracked by #10517
bellwether-softworks opened this issue May 13, 2024 · 5 comments · Fixed by #10526
Closed
Tracked by #10517

[Regression] Query using ARRAY_AGG(DISTINCT) causes panic #10486

bellwether-softworks opened this issue May 13, 2024 · 5 comments · Fixed by #10526
Assignees
Labels
bug Something isn't working regression Something that used to work no longer does

Comments

@bellwether-softworks
Copy link

Describe the bug

Beginning in v37.0.0, a previously-working query is found to result in a panic:

panicked at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-physical-expr-37.1.0/src/aggregate/array_agg_distinct.rs:158:99:
:
assertion `left == right` failed: state array should only include 1 row!
  left: 4
 right: 1

To Reproduce

The following query is known to trigger the panic, alongside the accompanying parquet payloads contained in the attached .zip file.

SELECT
        asm.floor,
        CONCAT(bi.name, '_', bi.asm_range_lower' - ', bi.asm_range_upper) AS name_with_range,
        ARRAY_AGG(DISTINCT s.code) AS codes,
        COUNT(DISTINCT asm.id) AS assembly_count,
        COUNT(DISTINCT s.id) AS item_qty
    FROM 'batch_items.parquet' bi
        INNER JOIN 'target_items.parquet' s
            ON s.id = bi.target_id
        INNER JOIN 'assemblies.parquet' asm
            ON s.assembly_id = asm.id
    GROUP BY
        asm.floor_ordinal,
        asm.floor,
        bi.batch_name,
        bi.name,
        bi.asm_range_lower,
        bi.asm_range_upper
    ORDER BY
        asm.floor_ordinal,
        bi.batch_name,
        bi.name,
        bi.asm_range_lower;

failing-query-assets.zip

Expected behavior

No response

Additional context

I've confirmed that the issue does not present in v36.0.0 or earlier, and is present in v37.0.0 up to v38.0.0. The issue doesn't present when omitting ARRAY_AGG(DISTINCT ...) from my queries.

@bellwether-softworks bellwether-softworks added the bug Something isn't working label May 13, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented May 13, 2024

I added the assertion because I don't know if there is any case that has len > 1.

It would be nice if you had a simpler example to add to the test!

@bellwether-softworks
Copy link
Author

@jayzhan211 I appreciate your concern regarding the complex example case; I attempted to create a simpler contrived example, but was unable to trigger the panic doing so. I don't currently know the exact conditions that are triggering the problem.

@jayzhan211
Copy link
Contributor

@jayzhan211 I appreciate your concern regarding the complex example case; I attempted to create a simpler contrived example, but was unable to trigger the panic doing so. I don't currently know the exact conditions that are triggering the problem.

That's fine. I think that is the reason why I can't easily find the test case that trigger this panic, probably only the complex aggregate query that has multiple states meet the requirement.

I think we can add your example as a test case. And, find out whether we should fix the multi states issue in merge_batch or elsewhere before entering merge_batch.

@jayzhan211 jayzhan211 added the regression Something that used to work no longer does label May 13, 2024
@riosw
Copy link

riosw commented May 13, 2024

I was playing around with this issue and found a more minimal example:

WITH A AS (
    SELECT
        1 AS id, 1 AS foo
    UNION ALL
    SELECT
        1, 5
)
SELECT
    ARRAY_AGG(DISTINCT a.foo),
    SUM(DISTINCT 1) 
FROM
    A a
GROUP BY
    a.id;

It is interesting though, using SUM(1) instead of SUM(DISTINCT 1) did not panic

@alamb alamb mentioned this issue May 15, 2024
4 tasks
@alamb alamb changed the title Query using ARRAY_AGG(DISTINCT) causes panic [Regression] Query using ARRAY_AGG(DISTINCT) causes panic May 15, 2024
@alamb
Copy link
Contributor

alamb commented May 15, 2024

Added to #10517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression Something that used to work no longer does
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants