Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling with_column twice generates an error when the second call uses a window function #12425

Closed
Tracked by #11902
Michael-J-Ward opened this issue Sep 10, 2024 · 0 comments · Fixed by #12431
Closed
Tracked by #11902
Labels
bug Something isn't working

Comments

@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented Sep 10, 2024

Describe the bug

Calling with_column twice generates an error when the second column is a window expression.

df
.with_column("foo", <normal_expr>)
.with_column("bar, <window_expr>)

Because "foo" does not have a qualifier, the second call to with_column ends up aliasing it as well.

let mut fields: Vec<Expr> = plan
.schema()
.iter()
.map(|(qualifier, field)| {
if field.name() == name {
col_exists = true;
new_column.clone()
} else if window_func && qualifier.is_none() {
col(Column::from((qualifier, field))).alias(name)
} else {
col(Column::from((qualifier, field)))
}
})
.collect();

Error: Plan("Projections require unique expression names but the expression \"s AS r\" at position 3 and \"row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS r\" at position 4 have the same name. Consider aliasing (\"AS\") one of them.")

To Reproduce

Update test_window_function_with_column to first call with_column with any expression.

For example:

    // Test issue: https://github.com/apache/datafusion/issues/11982
    // Window function was creating unwanted projection when using with_column() method.
    #[tokio::test]
    async fn test_window_function_with_column() -> Result<()> {
        let df = test_table().await?.select_columns(&["c1", "c2", "c3"])?;
        let ctx = SessionContext::new();
        let df_impl = DataFrame::new(ctx.state(), df.plan.clone());
        let func = row_number().alias("row_num");

        // This first `with_column` results in a column without a `qualifier` 
        let df_impl = df_impl.with_column("s", col("c2") + col("c3"))?;

        // This second `with_column` then assigns `"r"` alias to the above column and the window function
        // Should create an additional column with alias 'r' that has window func results
        let df = df_impl.with_column("r", func)?.limit(0, Some(2))?;
        assert_eq!(4, df.schema().fields().len());

        let df_results = df.clone().collect().await?;
        assert_batches_sorted_eq!(
            [
                "+----+----+-----+---+",
                "| c1 | c2 | c3  | r |",
                "+----+----+-----+---+",
                "| c  | 2  | 1   | 1 |",
                "| d  | 5  | -40 | 2 |",
                "+----+----+-----+---+",
            ],
            &df_results
        );

        Ok(())
    }

Expected behavior

I would expect the second call to succeed and the final dataframe to have columns c1, c2, c3, s, r

Additional context

#12000 introduced that conditional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant