Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark appropriate window functions as nullable: false in logical plan #7638

Conversation

matthewgapp
Copy link
Contributor

@matthewgapp matthewgapp commented Sep 24, 2023

add nullability information to window function that is used to determine expression and thus column nullability when constructing window plan

Which issue does this PR close?

Closes #7636

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…ne experession and thus column nullablity when constructing window plan
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 24, 2023
@@ -114,19 +114,35 @@ pub enum BuiltInWindowFunction {

impl BuiltInWindowFunction {
fn name(&self) -> &str {
use BuiltInWindowFunction::*;
use BuiltInWindowFunction as F;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flyby fix. Using wildcard here means that if a variant were deleted from the enum, the bottom-most variant in the match statement would become a variable catch-all.

use BuiltInWindowFunction as F;
match self {
F::RowNumber | F::Ntile | F::Rank | F::CumeDist => false,
// the rest are assumed to be nullable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there is a way that we can derive this from the definitions of the physical plans or elsewhere? Kinda sucks to have this information hardcoded this information in two places (right now, it exists also in the field method definition on the BuildInWindowFunctionExpr trait (example)

@@ -3315,3 +3315,23 @@ SELECT
window1 AS (ORDER BY C3)
ORDER BY C3
LIMIT 5

# Create table with window functions that have nullable: false columns should result in correct schema during cross join
Copy link
Contributor Author

@matthewgapp matthewgapp Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests fail on main but pass on this branch as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I couldn't include a test for the NTILE window function because it failed. I filed a separate bug here

Copy link
Contributor Author

@matthewgapp matthewgapp Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that starting with version 47.0.0 of Arrow, the tests referenced will no longer fail (without the fixes provided in this PR) because Arrow has stopped checking for schema nullability. For more context, visit issue comment.

Therefore, I plan to introduce tests that explicitly check for nullable: false in the logical schema where it is expected, such as in columns created via row_number(), rank(), and similar functions. I will address this tonight.

@matthewgapp matthewgapp marked this pull request as ready for review September 25, 2023 02:18
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @matthewgapp -- I started to look at this PR today, and it looks in general good to me, but I want to look into a few more things and I ran out of time; I plan to review it more carefully tomorrow

@alamb
Copy link
Contributor

alamb commented Sep 25, 2023

Related comment: #7636 (comment)

@alamb
Copy link
Contributor

alamb commented Sep 27, 2023

See #7636 (comment)

@matthewgapp matthewgapp changed the title add nullability information to window function mark appropriate window functions as nullable: false in logical plan Sep 27, 2023
@alamb
Copy link
Contributor

alamb commented Oct 1, 2023

Marking as draft as I think #7694 may be fixing the root cause instead

Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Apr 29, 2024
@github-actions github-actions bot closed this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CREATE TABLE DDL does not save correct schema, resulting in mismatched plan vs execution (record batch) schema
2 participants