Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix projection name with DataFrame::with_column and window functions #12000

Merged
merged 9 commits into from
Aug 17, 2024

Conversation

devanbenz
Copy link
Contributor

@devanbenz devanbenz commented Aug 15, 2024

Signed-off-by: Devan devandbenz@gmail.com## Which issue does this PR close?

Closes #11982

Rationale for this change

Previous usage of dataframe with_column using a Window expression was causing an unnecessary projection in the output.

+---+-----------------------------------------------------------------------+---+
| a | ROW_NUMBER() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | r |
+---+-----------------------------------------------------------------------+---+
| 1 | 1                                                                     | 1 |
| 2 | 2                                                                     | 2 |
| 3 | 3                                                                     | 3 |
| 4 | 4                                                                     | 4 |
| 5 | 5                                                                     | 5 |
+---+-----------------------------------------------------------------------+---+

I also tested manually with a few other Expr to test if that would cause issues by filtering the qualifier:

    let func = Expr::Literal(ScalarValue::Int32(Some(10)));
    df.with_column("r", func)?.show().await?;

outputs:

+---+----+
| a | r  |
+---+----+
| 5 | 10 |
| 4 | 10 |
| 3 | 10 |
| 2 | 10 |
| 1 | 10 |
+---+----+

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…sage

Signed-off-by: Devan <devandbenz@gmail.com>
…sage

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@github-actions github-actions bot added the core Core DataFusion crate label Aug 15, 2024
@devanbenz devanbenz marked this pull request as draft August 15, 2024 03:43
Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@devanbenz devanbenz marked this pull request as ready for review August 15, 2024 18:17
Signed-off-by: Devan <devandbenz@gmail.com>
@alamb alamb changed the title fix/11982: resolves projection issue found in with_column window fn usage Fix projection name with DataFrame::with_column and window functions Aug 16, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @devanbenz

cc @timsaucer thank you for the report. Perhaps you have time to review the change as well

Comment on lines 2884 to 2890
let func = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::BuiltInWindowFunction(
BuiltInWindowFunction::RowNumber,
),
vec![],
))
.alias("row_num");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the expr fn here and make this more concise:

Suggested change
let func = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::BuiltInWindowFunction(
BuiltInWindowFunction::RowNumber,
),
vec![],
))
.alias("row_num");
let func = row_number().alias("row_num");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also ran this test without the code changes and it fails like this:


assertion `left == right` failed
  left: 4
 right: 5

Left:  4
Right: 5
<Click to see difference>

thread 'dataframe::tests::test_window_function_with_column' panicked at datafusion/core/src/dataframe/mod.rs:2882:9:
assertion `left == right` failed
  left: 4
 right: 5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs:652:5
...

And the output was like

[
    "+----+----+-----+-----------------------------------------------------------------------+---+",
    "| c1 | c2 | c3  | ROW_NUMBER() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | r |",
    "+----+----+-----+-----------------------------------------------------------------------+---+",
    "| c  | 2  | 1   | 1                                                                     | 1 |",
    "| d  | 5  | -40 | 2                                                                     | 2 |",
    "+----+----+-----+-----------------------------------------------------------------------+---+",
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I went ahead and used the more concise method call. Thanks!

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I also tested locally and it produces the expected results. Thank you for taking care of this!

@alamb alamb merged commit 48416e5 into apache:main Aug 17, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 17, 2024

Thanks again @devanbenz and @timsaucer -- getting better every day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Window functions create unwanted projection
3 participants