Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point order-by columns for RANGE window functions #13512

Merged
merged 18 commits into from
Jun 20, 2023

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Jun 5, 2023

Description

This commit adds support for FLOAT32 and FLOAT64 order-by columns for RANGE-based window functions.

Background

Up until this commit, order-by columns for RANGE window functions were allowed to be integral numerics, timestamps, or strings (for unbounded/current rows).

With this commit, window functions will be permitted to run on floating point value ranges. E.g. This supports windows defined with floating point deltas, like rows with values exceeding the current row by no more than 3.14f.

This is in the same vein as the support for DECIMAL (#11645) and STRING (#13143).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mythrocks mythrocks added feature request New feature or request non-breaking Non-breaking change labels Jun 5, 2023
@mythrocks mythrocks requested a review from a team as a code owner June 5, 2023 20:21
@mythrocks mythrocks self-assigned this Jun 5, 2023
@mythrocks mythrocks marked this pull request as draft June 5, 2023 20:21
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jun 5, 2023
@mythrocks mythrocks marked this pull request as ready for review June 8, 2023 16:48
mythrocks added a commit to mythrocks/cudf that referenced this pull request Jun 8, 2023
JNI follow-up for rapidsai#13512, which added support for float/double columns
as order-by for range window functions.

This commit adds JNI support, to be able to do the same from Java.
@mythrocks
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 5a7e3c7 into rapidsai:branch-23.08 Jun 20, 2023
@mythrocks
Copy link
Contributor Author

Thank you for reviewing, chaps. I've merged this change.

rapids-bot bot pushed a commit that referenced this pull request Jun 21, 2023
…ons (#13595)

This is a JNI follow-up to #13512. This adds support in the Java CUDF API to use floating point types (`FLOAT32`/`FLOAT64`) as the order-by column for range-based window functions.

There are no API changes; only the implementation was modified to permit floating-point types for the operation. A test was added to ratify behaviour.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #13595
mythrocks added a commit to mythrocks/cudf that referenced this pull request Jun 28, 2023
This is a follow-up to rapidsai#13512 (which added support for floating point
order-by columns in window functions), and rapidsai#13606 (which fixed how
negative values are handled for floating point order-by).

This commit fixes how `NaN` and `+/- Infinity` values are handled
for floating point.

Prior to this commit, the calculations for range window extents depended
on the behaviour of `thrust::less<float>` and `thrust::greater<float>`,
as well as addition/subtraction on `+/- Infinity`. This produced some
unexpected results:
1. `thrust::less`/`greater` on `NaN` does not produce strict ordering.
2. Addition/Subtraction on the numerical values of `Infinity` could
   produce finite values that interfere with window extent calculations.
   Ideally, the results should have remained infinite.

This commit adds custom comparators with `NaN` awareness, to better
handle columns that contain `NaN`s. It also fixes range calculations
where `Infinity` is involved.

Tests have been added to cover ASC/DESC order sorting on `FLOAT`,
with `NaN` and `Infinity` values.
rapids-bot bot pushed a commit that referenced this pull request Jul 5, 2023
This is a follow-up to #13512 (which added support for floating point order-by columns in window functions), and #13606 (which fixed how negative values are handled for floating point order-by).

This commit fixes how `NaN` and `+/- Infinity` values are handled for floating point.

Prior to this commit, the calculations for range window extents depended on the behaviour of `thrust::less<float>` and `thrust::greater<float>`, as well as addition/subtraction on `+/- Infinity`. This produced some unexpected results:
1. `thrust::less`/`greater` on `NaN` does not produce strict ordering.
2. Addition/Subtraction on the numerical values of `Infinity` could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite.

This commit adds custom comparators with `NaN` awareness, to better handle columns that contain `NaN`s. It also fixes range calculations where `Infinity` is involved.

Tests have been added to cover ASC/DESC order sorting on `FLOAT`, with `NaN` and `Infinity` values.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - https://github.com/nvdbaranec

URL: #13635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants