Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extract on intervals #11501

Closed
wants to merge 1 commit into from
Closed

Support extract on intervals #11501

wants to merge 1 commit into from

Conversation

nrc
Copy link
Contributor

@nrc nrc commented Jul 16, 2024

This is an early draft PR to demonstrate an Arrow change (see below for details).

Which issue does this PR close?

Closes #6327.

What changes are included in this PR?

This PR simply allows Interval types to be passed through to Arrow. It will require an Arrow upgrade to work since it depends on apache/arrow-rs#6071 (currently Arrow is patched for the CLI in this PR, this will only work if you have modified Arrow in a sibling folder to DataFusion, I'll remove this and update the PR when there is an Arrow release).

Are these changes tested?

Tested in Arrow. Tests to come here.

Are there any user-facing changes?

The extract function will now work with intervals.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 16, 2024
@nrc
Copy link
Contributor Author

nrc commented Aug 1, 2024

apache/arrow-rs#6071 has landed. It just missed the 52.2 release of Arrow, so hopefully will be in 53. Once that is released (scheduled for September), I'll resurrect and rebase this PR and add some tests.

@@ -81,6 +82,9 @@ impl DatePartFunc {
Exact(vec![Utf8, Time32(Millisecond)]),
Exact(vec![Utf8, Time64(Microsecond)]),
Exact(vec![Utf8, Time64(Nanosecond)]),
Exact(vec![Utf8, Interval(YearMonth)]),
Exact(vec![Utf8, Interval(DayTime)]),
Exact(vec![Utf8, Interval(MonthDayNano)]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc can we add

                    Exact(vec![Utf8, Duration(Second)]),
                    Exact(vec![Utf8, Duration(Millisecond)]),
                    Exact(vec![Utf8, Duration(Microsecond)]),
                    Exact(vec![Utf8, Duration(Nanosecond)]),
                    Exact(vec![Utf8View, Duration(Second)]),
                    Exact(vec![Utf8View, Duration(Millisecond)]),
                    Exact(vec![Utf8View, Duration(Microsecond)]),
                    Exact(vec![Utf8View, Duration(Nanosecond)]),

here, then convert durations to Intervals — presumably MonthDayNano is easiest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are added to this PR, Arrow PR coming soon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added functions and removed logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 14, 2024
Signed-off-by: Nick Cameron <nrc@ncameron.org>
@github-actions github-actions bot added core Core DataFusion crate common Related to common crate labels Aug 23, 2024
@nrc
Copy link
Contributor Author

nrc commented Aug 23, 2024

After some testing, I found a couple of bugs due to the seconds function making assumptions about date_part which don't hold for Interval/Duration. Calling date_part to get the nanos from an Interval or Duration will often return Null due to overflow, which will cause date_part/extract to return null, even if we should have ignored the nanos and just used the whole seconds (e.g, extracting seconds from a 1000000 second duration should return 1000000, not null). Furthermore, extract(second from interval '1 second') would return 2 because the one second was queried as seconds and nanoseconds.

Both those bugs are fixed, but because the formatting of Intervals has changed with the Arrow update, it is difficult to write tests, so those will come after the proper Arrow update.

@nrc
Copy link
Contributor Author

nrc commented Sep 17, 2024

Closed in favour of #12514

@nrc nrc closed this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract from interval type failed
2 participants