[FEA] Improve accuracy when casting floating-point numbers to decimal #11079

ttnghia · 2022-06-08T13:58:11Z

Currently, decimals constructed from floating-point numbers are done by:

Shifting (multiple/dividing by scale value)
Rounding the result toward zero to an integer for internal storage

Ref:

cudf/cpp/include/cudf/fixed_point/fixed_point.hpp

Line 230 in fae2221

    
           CUDF_HOST_DEVICE inline explicit fixed_point(T const& value, scale_type const& scale)

As such, a number like 0.28 and scale=-1 can be shifted and rounded to 2 and stored as 2! Of course the more accurate rounded number should be 3. We should do a better rounding for this process.

Such rounding was causing issues for the users. In particular, in spark-rapids: NVIDIA/spark-rapids#5765

The text was updated successfully, but these errors were encountered:

jrhemstad · 2022-06-08T14:21:39Z

Of course the more accurate rounded number should be 3.

Why?

float -> integer rounding in C++ always rounds towards zero.

https://godbolt.org/z/4aPMj5zzd

ttnghia · 2022-06-08T14:24:50Z

That is rounding. We are casting instead, rounding is just the way we use to do casting.

ttnghia · 2022-06-08T16:42:17Z

Oh here we are casting from floating-point to fixed-point, thus the casting to use should be float-to-point rounding then casting to int, not float-to-int direct casting.

float-to-float rounding is round-to-nearest by C++ specs.

codereport · 2022-06-08T19:59:37Z

(reposting this message from slack)

There is a long history behind this and truncation is definitely the desired and intentional behaviour.

Here are the main motivations for truncation:

Consistency with CNL. CNL is being proposed for the standard library and consistency with standard types is a goal of libcudf. Note that (IIRC) every other C++ fixed-point library has made the same choice.
Relying on the behaviour of integer types in C++ greatly simplifies certain binary operations like divison for fixed-point
We can (and have) provided rounding functionality through the cudf::round API so on the Spark side of things you are able to get whatever behaviour you want

Note that at one point, we actually did have rounding functionality baked into floating point construction for fixed-point. However, it wasn't actually to address the desire to be more "accurate" as @ttnghia is pointing out, it was to try and address inherent issues in floating point (such as 1.001 not being representable by floating point, meaning if you construct a fixed-point with scale -3, you end up with 1 , missing the .001). @harrism and I had many discussions about this and both agreed that "ideally" fixed-point should be able to avoid this at the end of the day. However, it ended up presenting too many issues / complications and not actually comprehensively fixing all cases. And note, trying to be "better" was not consistent with what CNL and other C++ fixed-point libraries do. So we decided at a certain point to just accept the inherent flaws of floating point and follow CNL. This was done in the following small PR: #6544 If you take a look at the unit tests, it is very clear how the behaviour changed by looking at the before and after. Furthermore, there was a bunch of "opposition research" done at one point in to what other C++ fixed-point libraries do. Our goal is not to be similar to Julia or Go or Python, but to be a generic C++ backend that provides all the tools to do whatever you want in the front end language (like cuDF or Spark-RAPIDS).

ttnghia · 2022-06-08T20:02:09Z

Thanks @codereport. That explains to me why we have such choice and I'm convinced with it. So I'm going to close the issue.

ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels Jun 8, 2022

ttnghia mentioned this issue Jun 8, 2022

Fix the overflow of container type when casting floats to decimal NVIDIA/spark-rapids#5766

Merged

ttnghia closed this as completed Jun 8, 2022

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Improve accuracy when casting floating-point numbers to decimal #11079

[FEA] Improve accuracy when casting floating-point numbers to decimal #11079

ttnghia commented Jun 8, 2022 •

edited

Loading

jrhemstad commented Jun 8, 2022

ttnghia commented Jun 8, 2022

ttnghia commented Jun 8, 2022

codereport commented Jun 8, 2022 •

edited

Loading

ttnghia commented Jun 8, 2022

[FEA] Improve accuracy when casting floating-point numbers to decimal #11079

[FEA] Improve accuracy when casting floating-point numbers to decimal #11079

Comments

ttnghia commented Jun 8, 2022 • edited Loading

jrhemstad commented Jun 8, 2022

ttnghia commented Jun 8, 2022

ttnghia commented Jun 8, 2022

codereport commented Jun 8, 2022 • edited Loading

ttnghia commented Jun 8, 2022

ttnghia commented Jun 8, 2022 •

edited

Loading

codereport commented Jun 8, 2022 •

edited

Loading