Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive unnest #11062

Merged
merged 13 commits into from
Jul 2, 2024
Merged

Recursive unnest #11062

merged 13 commits into from
Jul 2, 2024

Conversation

duongcongtoai
Copy link
Contributor

Which issue does this PR close?

Closes #10660.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@duongcongtoai duongcongtoai changed the title 10660 recursive unnest Recursive unnest Jun 22, 2024
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Jun 22, 2024
@duongcongtoai duongcongtoai marked this pull request as ready for review June 25, 2024 21:10
let transformed_exprs = transform(&expr, arg)?;
Ok(Transformed::yes(transformed_exprs[0].clone()))
// root_expr.push(transformed_exprs[0].clone());
Ok(Transformed::new(transformed_exprs[0].clone(),true,TreeNodeRecursion::Stop))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use swap_remove(0) to avoid clone

input: &LogicalPlan,
unnest_placeholder_columns: &mut Vec<String>,
inner_projection_exprs: &mut Vec<Expr>,
original_expr: Expr,
original_expr: &Expr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid clone in L297 too.

pub(crate) fn recursive_transform_unnest(
/// If along the path from root to bottom, there are multiple unnest expressions, the transformation
/// is done only for the bottom expression
pub(crate) fn transform_bottom_unnest(
Copy link
Contributor

@jayzhan211 jayzhan211 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would transform List / Struct separately help readability here?

something like

transform_bottom_unnest() {
 // common logic
 fn transform();

 list:
     transform_list // list specific logic
 struct:
     transform_struct // struct specific logic (only unnest with the top level)
}

@alamb
Copy link
Contributor

alamb commented Jun 28, 2024

I plan to review this tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @duongcongtoai -- I think this PR is quite close. In my opinion it only needs a few more tests and we should be able to merge it.

Thanks @jayzhan211 for the review

@@ -291,51 +290,68 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}

/// Try converting Expr(Unnest(Expr)) to Projection/Unnest/Projection
/// FIXME: if within select_exprs, one unnest expr is referenced multiple times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we file a ticket to track this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this FIXME to sqllogictest, ticket created: #11198

@@ -33,8 +33,8 @@ AS VALUES
statement ok
CREATE TABLE nested_unnest_table
AS VALUES
(struct('a', 'b', struct('c')), (struct('a', 'b', [10,20])), [struct('a', 'b')]),
(struct('d', 'e', struct('f')), (struct('x', 'y', [30,40, 50])), null)
(struct(['a'], 'b', struct('c')), (struct('a', 'b', [[10,20]])), [struct(['a'], 'b')]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to please leave the original test the same (so we can verify the existing behavior is unchanged) and add new tests for the newly added features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i revert the change on this table and created a new one

// unnest struct is only possible here
// The transformation looks like
// - unnest(struct_col) will be transformed into unnest(struct_col).field1, unnest(struct_col).field2
if let Expr::Unnest(Unnest { expr: ref arg }) = transformed_expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having to clone the expr to special case the root, could you instead check the root initially?

Something like

if let Expr::Unnest(Unnest { expr })  = original_expr {
  let expr = expr.transform_up(...);
  Ok(transform(transformex_expe)
}

And similarly for the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that approach, but it will add more logic to the code, something like:

    if let Expr::Unnest(Unnest { expr: ref arg }) = original_expr {
        let Transformed {
            data: transformed_expr,
            transformed,
            tnr: _,
        } = arg.clone().transform_up(|expr: Expr| {
...
        })?;
        // no transformation eligible in the children
        // only then the root unnest can be transformed
        if !transformed {
            return transform(&transformed_expr, arg);
        } else {
            return Ok(vec![transformed_expr]);
        }
    }

    // from this onward, unnest(struct) is invalid because
    // we are sure that root_expr is not unnest expr
    let Transformed {
        data: transformed_expr,
        transformed,
        tnr: _,
    } = original_expr.clone().transform_up(|expr: Expr| {
...
    if !transformed {
        if matches!(&transformed_expr, Expr::Column(_)) {
            inner_projection_exprs.push(transformed_expr.clone());
            Ok(vec![transformed_expr])
        } else {
            // We need to evaluate the expr in the inner projection,
            // outer projection just select its name
            let column_name = transformed_expr.display_name()?;
            inner_projection_exprs.push(transformed_expr);
            Ok(vec![Expr::Column(Column::from_name(column_name))])
        }
    } else {
        Ok(vec![transformed_expr])
    }

    })?;

For the clone() part, i don't know how to avoid it because transform_up will take the ownership of the expr
Please let me know your opinion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have is good -- thank you for trying this


## double unnest list
query I?
select unnest(unnest(column2['c2'])), column2 from nested_unnest_table;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also please add tests for:

  1. Multiple unnest: select unnest(col1), unnest(col1), select unnest(col1), unnest(col12)
  2. Unnest + expression: select unnest(col1)+1
  3. Unnest on a non structured column (e.g. unnest('foo'))
  4. Triple unnest unnest(unnest(unnnest(..)))

(it is ok if they error, but I think it would be useful to have the additional test coverage that demonstrates how sophisticated the implementation is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added more tests

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @duongcongtoai

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @duongcongtoai and @alamb

@jayzhan211 jayzhan211 merged commit 58f79e1 into apache:main Jul 2, 2024
23 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* chore: fix map children of unnest

* adjust test

* remove debug

* chore: move test to unnest.slt

* chore: rename

* add some comment

* compile err

* more comment

* chore: address comment

* more coverage

* one more scenario
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support recursive unnest
3 participants