Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent logical & physical NTILE return types #8270

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

korowa
Copy link
Contributor

@korowa korowa commented Nov 19, 2023

Which issue does this PR close?

Closes #7639.

Rationale for this change

At this moment, while logical plan building, return type of ntile considered to be UInt32, and physical expression of ntile returns UInt64 -- this leads to incompatible schemas of created memory table and record batches on insertion.

FWIW: another schema incompatibility left (non breaking in case of CTAS, though) -- nullability of output fields, should be fixed in #7638

What changes are included in this PR?

Return type of ntile logical expression is UInt64 now

Are these changes tested?

Unit test for return type & minimal reproducer from the issue in sqllogictests

Are there any user-facing changes?

Return type of ntile logical expression is UInt64 now

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 19, 2023
Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good.
It might be good to fix some other bugs in ntile. Do you think a separate issue should be opened for these or is it something maybe worth addressing in this PR?

DataFusion CLI v33.0.0
❯ create table t1 (a int);
0 rows in set. Query took 0.005 seconds.

❯ insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3     |
+-------+
1 row in set. Query took 0.006 seconds.

-- Do these results make sense?  All other databases return ntile values 1,2,3.
-- Tested at https://dbfiddle.uk/select ntile(9223377) OVER(ORDER BY a) from t1;
+--------------------------------------------------------------------------------------------------------+
| NTILE(Int64(9223377)) ORDER BY [t1.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
+--------------------------------------------------------------------------------------------------------+
| 1                                                                                                      |
| 3074460                                                                                                |
| 6148919                                                                                                |
+--------------------------------------------------------------------------------------------------------+

-- This should return a regular error instead of an internal errorselect ntile(9223372036854775809) OVER(ORDER BY a) from t1;
Internal error: Cannot convert UInt64(9223372036854775809) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

-- This should not panic and crash the datafusion cli
❯ select ntile(-922337203685477580) OVER(ORDER BY a) from t1;
thread 'main' panicked at /home/ms/git/arrow-datafusion/datafusion/physical-expr/src/window/ntile.rs:100:23:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/home/ms/git/arrow-datafusion/datafusion-cli (ntile_output_type ✔) ᐅ 

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @korowa

@alamb
Copy link
Contributor

alamb commented Nov 20, 2023

The fix looks good. It might be good to fix some other bugs in ntile. Do you think a separate issue should be opened for these or is it something maybe worth addressing in this PR?

DataFusion CLI v33.0.0
❯ create table t1 (a int);
0 rows in set. Query took 0.005 seconds.

❯ insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3     |
+-------+
1 row in set. Query took 0.006 seconds.

-- Do these results make sense?  All other databases return ntile values 1,2,3.
-- Tested at https://dbfiddle.uk/select ntile(9223377) OVER(ORDER BY a) from t1;
+--------------------------------------------------------------------------------------------------------+
| NTILE(Int64(9223377)) ORDER BY [t1.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
+--------------------------------------------------------------------------------------------------------+
| 1                                                                                                      |
| 3074460                                                                                                |
| 6148919                                                                                                |
+--------------------------------------------------------------------------------------------------------+

-- This should return a regular error instead of an internal errorselect ntile(9223372036854775809) OVER(ORDER BY a) from t1;
Internal error: Cannot convert UInt64(9223372036854775809) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

-- This should not panic and crash the datafusion cli
❯ select ntile(-922337203685477580) OVER(ORDER BY a) from t1;
thread 'main' panicked at /home/ms/git/arrow-datafusion/datafusion/physical-expr/src/window/ntile.rs:100:23:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/home/ms/git/arrow-datafusion/datafusion-cli (ntile_output_type ✔) ᐅ 

Thanks @msirek . I recommend we file a ticket and fix these issues in a follow on PR. My rationale is that this PR is strictly better than main (even though you have identified other areas where it can be better).

@msirek
Copy link
Contributor

msirek commented Nov 20, 2023

I recommend we file a ticket and fix these issues in a follow on PR.

Thanks @alamb, I've opened #8284

@korowa
Copy link
Contributor Author

korowa commented Nov 20, 2023

@msirek , @alamb thank you for reviewing.

I agree regarding creating a separate issue (and thanks for reproducers!)

@alamb alamb merged commit 47b4972 into apache:main Nov 21, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CREATE TABLE fails with NTILE window function select query
3 participants