Floating point order-by columns for RANGE window functions #13512

mythrocks · 2023-06-05T20:21:23Z

Description

This commit adds support for FLOAT32 and FLOAT64 order-by columns for RANGE-based window functions.

Background

Up until this commit, order-by columns for RANGE window functions were allowed to be integral numerics, timestamps, or strings (for unbounded/current rows).

With this commit, window functions will be permitted to run on floating point value ranges. E.g. This supports windows defined with floating point deltas, like rows with values exceeding the current row by no more than 3.14f.

This is in the same vein as the support for DECIMAL (#11645) and STRING (#13143).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: MithunR <mythrocks@gmail.com>

…float-oby

cpp/src/rolling/range_window_bounds.cpp

JNI follow-up for rapidsai#13512, which added support for float/double columns as order-by for range window functions. This commit adds JNI support, to be able to do the same from Java.

…float-oby

cpp/src/rolling/range_window_bounds.cpp

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

…float-oby

cpp/src/rolling/detail/range_window_bounds.hpp

cpp/src/rolling/range_window_bounds.cpp

cpp/src/rolling/detail/range_window_bounds.hpp

mythrocks · 2023-06-20T22:43:50Z

/merge

mythrocks · 2023-06-20T22:44:13Z

Thank you for reviewing, chaps. I've merged this change.

…ons (#13595) This is a JNI follow-up to #13512. This adds support in the Java CUDF API to use floating point types (`FLOAT32`/`FLOAT64`) as the order-by column for range-based window functions. There are no API changes; only the implementation was modified to permit floating-point types for the operation. A test was added to ratify behaviour. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Nghia Truong (https://github.com/ttnghia) - Robert (Bobby) Evans (https://github.com/revans2) URL: #13595

This is a follow-up to rapidsai#13512 (which added support for floating point order-by columns in window functions), and rapidsai#13606 (which fixed how negative values are handled for floating point order-by). This commit fixes how `NaN` and `+/- Infinity` values are handled for floating point. Prior to this commit, the calculations for range window extents depended on the behaviour of `thrust::less<float>` and `thrust::greater<float>`, as well as addition/subtraction on `+/- Infinity`. This produced some unexpected results: 1. `thrust::less`/`greater` on `NaN` does not produce strict ordering. 2. Addition/Subtraction on the numerical values of `Infinity` could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite. This commit adds custom comparators with `NaN` awareness, to better handle columns that contain `NaN`s. It also fixes range calculations where `Infinity` is involved. Tests have been added to cover ASC/DESC order sorting on `FLOAT`, with `NaN` and `Infinity` values.

This is a follow-up to #13512 (which added support for floating point order-by columns in window functions), and #13606 (which fixed how negative values are handled for floating point order-by). This commit fixes how `NaN` and `+/- Infinity` values are handled for floating point. Prior to this commit, the calculations for range window extents depended on the behaviour of `thrust::less<float>` and `thrust::greater<float>`, as well as addition/subtraction on `+/- Infinity`. This produced some unexpected results: 1. `thrust::less`/`greater` on `NaN` does not produce strict ordering. 2. Addition/Subtraction on the numerical values of `Infinity` could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite. This commit adds custom comparators with `NaN` awareness, to better handle columns that contain `NaN`s. It also fixes range calculations where `Infinity` is involved. Tests have been added to cover ASC/DESC order sorting on `FLOAT`, with `NaN` and `Infinity` values. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) - https://github.com/nvdbaranec URL: #13635

mythrocks added 7 commits June 1, 2023 15:27

WIP: Code in place. Refafctoring tests.

375b3f7

Working tests for floating point types.

4ae8299

Signed-off-by: MithunR <mythrocks@gmail.com>

Refactored test: Common call to grouped_rolling.

1ba40f8

Formatting.

dd43cf4

Signed-off-by: MithunR <mythrocks@gmail.com>

Better name for the floating point tests.

b567ad4

Switched from enable_if to constexpr.

d6eb434

Minor code simplication.

5014929

mythrocks added feature request New feature or request non-breaking Non-breaking change labels Jun 5, 2023

mythrocks requested a review from a team as a code owner June 5, 2023 20:21

mythrocks self-assigned this Jun 5, 2023

mythrocks requested review from divyegala and nvdbaranec June 5, 2023 20:21

mythrocks marked this pull request as draft June 5, 2023 20:21

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jun 5, 2023

Merge remote-tracking branch 'origin/branch-23.08' into range-window-…

3b3e8e0

…float-oby

mythrocks marked this pull request as ready for review June 8, 2023 16:48

Merge remote-tracking branch 'origin/branch-23.08' into range-window-…

f13e77b

…float-oby

davidwendt reviewed Jun 8, 2023

View reviewed changes

cpp/src/rolling/range_window_bounds.cpp Outdated Show resolved Hide resolved

Switch back from if constexpr to enable_if.

190edd5

mythrocks force-pushed the range-window-float-oby branch from 1e3d28f to 190edd5 Compare June 8, 2023 22:27

Fixed CUDF_FAIL message.

c340309

ttnghia reviewed Jun 8, 2023

View reviewed changes

cpp/src/rolling/range_window_bounds.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Jun 8, 2023

View reviewed changes

cpp/src/rolling/range_window_bounds.cpp Outdated Show resolved Hide resolved

mythrocks added 2 commits June 9, 2023 11:58

Remove invalid enable-if for float.

48135ec

Merge remote-tracking branch 'origin/branch-23.08' into range-window-…

1c3e5e1

…float-oby

ttnghia approved these changes Jun 13, 2023

View reviewed changes

sameerz mentioned this pull request Jun 13, 2023

[FEA] Window Expression orderBy column is not supported in a window range function, found DoubleType NVIDIA/spark-rapids#7801

Closed

davidwendt reviewed Jun 13, 2023

View reviewed changes

cpp/src/rolling/range_window_bounds.cpp Outdated Show resolved Hide resolved

mythrocks and others added 3 commits June 14, 2023 11:06

Update cpp/src/rolling/range_window_bounds.cpp

554e3ef

Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>

Fix range-bounds overloads.

7ded2a7

Merge remote-tracking branch 'origin/branch-23.08' into range-window-…

77230ac

…float-oby

mythrocks requested a review from davidwendt June 20, 2023 17:29

davidwendt reviewed Jun 20, 2023

View reviewed changes

cpp/src/rolling/detail/range_window_bounds.hpp Outdated Show resolved Hide resolved

davidwendt reviewed Jun 20, 2023

View reviewed changes

cpp/src/rolling/range_window_bounds.cpp Outdated Show resolved Hide resolved

Consolidated to cudf::is_numeric.

9599cb3

mythrocks requested a review from davidwendt June 20, 2023 17:58

davidwendt reviewed Jun 20, 2023

View reviewed changes

cpp/src/rolling/detail/range_window_bounds.hpp Outdated Show resolved Hide resolved

davidwendt approved these changes Jun 20, 2023

View reviewed changes

Fixed straggler for is_numeric() check.

d93735f

rapids-bot bot merged commit 5a7e3c7 into rapidsai:branch-23.08 Jun 20, 2023

mythrocks mentioned this pull request Jun 20, 2023

Java support: Floating point order-by columns for RANGE window functions #13595

Merged

3 tasks

mythrocks mentioned this pull request Jun 28, 2023

Fix inf/NaN comparisons for FLOAT orderby in window functions #13635

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point order-by columns for RANGE window functions #13512

Floating point order-by columns for RANGE window functions #13512

mythrocks commented Jun 5, 2023 •

edited

Loading

mythrocks commented Jun 20, 2023

mythrocks commented Jun 20, 2023

Floating point order-by columns for RANGE window functions #13512

Floating point order-by columns for RANGE window functions #13512

Conversation

mythrocks commented Jun 5, 2023 • edited Loading

Description

Background

Checklist

mythrocks commented Jun 20, 2023

mythrocks commented Jun 20, 2023

mythrocks commented Jun 5, 2023 •

edited

Loading