`<charconv>`: Fix `from_chars()` `float` tiebreaking #2366

StephanTLavavej · 2021-11-30T11:40:49Z

from_chars() is complicated, but this root cause was simple, in isolation. It involved our only call to bignum division:

STL/stl/inc/charconv

Lines 1582 to 1584 in 3c2fd04

    
           uint64_t _Fractional_mantissa = _Divide(_Fractional_numerator, _Fractional_denominator);
 
           _Has_zero_tail = _Has_zero_tail && _Fractional_numerator._Myused == 0;

This is trying to check if the remainder is zero:

STL/stl/inc/charconv

Lines 881 to 885 in 3c2fd04

    
           // This high-precision integer division implementation was translated from the implementation of
 
           // System.Numerics.BigIntegerBuilder.ModDivCore in the .NET Framework sources.
 
           // It computes both quotient and remainder: the remainder is stored in the _Numerator argument,
 
           // and the least significant 64 bits of the quotient are returned from the function.
 
           _NODISCARD inline uint64_t _Divide(_Big_integer_flt& _Numerator, const _Big_integer_flt& _Denominator) noexcept {

And it was assuming that _Big_integer_flt is "trimmed", i.e. that "represents zero" is equivalent to _Myused == 0 (indicating no active elements).

However, one codepath in _Divide can result in a _Big_integer_flt where _Myused is 1 and _Mydata[0] is 0:

STL/stl/inc/charconv

Lines 933 to 936 in 3c2fd04

    
           _Numerator._Mydata[1] = static_cast<uint32_t>(_Uu >> 32);
 
           _Numerator._Mydata[0] = static_cast<uint32_t>(_Uu);
 
           _Numerator._Myused    = _Numerator._Mydata[1] > 0 ? 2u : 1u;
 
           return _Quotient;

In the bug scenario, this leads to _Has_zero_tail becoming mistakenly false, so when tiebreaker scenarios want to round to Even and Even happens to be Down, they instead round Up (as a non-zero tail would indicate that we're just above the exact midpoint).

The fix is to perform trimming in this codepath. I checked all of _Divide's other paths to return, and they all appear to be properly trimmed.

This bug was present in the UCRT's original sources (i.e. in VS 2015 strtof), carefully preserved during my distillation process. I'll send the fix to them, and we'll have to figure out how to get the compiler fixed too.

Here's my exhaustive test:

C:\Temp>type purr.cpp

#include <algorithm>
#include <cassert>
#include <charconv>
#include <cmath>
#include <cstdio>
#include <execution>
#include <iterator>
#include <mutex>
#include <random>
#include <system_error>
#include <utility>
#include <vector>
using namespace std;

struct TestCase {
    double initial_dbl;
    double final_dbl;
    double step_dbl;

    float initial_flt;
    float step_flt;
};

vector<TestCase> make_test_cases() {
    vector<TestCase> ret;

    const TestCase subnormal{0x0.000001p-127, 0x0.ffffffp-127, 0x0.000002p-127, 0x0.000000p-127f, 0x0.000004p-127f};
    ret.push_back(subnormal);

    for (TestCase tc{0x1.000001p-127, 0x1.ffffffp-127, 0x0.000002p-127, 0x1.000000p-127f, 0x0.000004p-127f};
         tc.initial_dbl < 0x1p128;
         tc.initial_dbl *= 2, tc.final_dbl *= 2, tc.step_dbl *= 2, tc.initial_flt *= 2, tc.step_flt *= 2) {
        ret.push_back(tc);
    }

    return ret;
}

struct SharedState {
    mutex mut;
    vector<pair<double, float>> bugs;
    vector<pair<double, float>> examples;
    mt19937 urng{1729};
};

int main() {
    const auto test_cases = make_test_cases();

    SharedState state;

    for_each(execution::par, test_cases.begin(), test_cases.end(), [&state](const TestCase& tc) {
        char dec_buf[400];

        float expected = tc.initial_flt;
        bool round_up  = false;

        vector<pair<double, float>> local_bugs;
        vector<pair<double, float>> local_examples;

        for (double dbl = tc.initial_dbl; dbl <= tc.final_dbl; dbl += tc.step_dbl) {
            if (expected != 0.0f && !isinf(expected)) {
                const auto to_result = to_chars(dec_buf, end(dec_buf), dbl, chars_format::general, 1000);
                assert(to_result.ec == errc{});

                float flt        = 0.0f;
                auto from_result = from_chars(dec_buf, to_result.ptr, flt);
                assert(from_result.ec == errc{});

                if (flt == expected) {
                    local_examples.emplace_back(dbl, expected);
                } else {
                    local_bugs.emplace_back(dbl, expected);
                }
            }

            // hexit 1 rounds down to 0
            // hexit 3 rounds  up  to 4
            // hexit 5 rounds down to 4
            // hexit 7 rounds  up  to 8
            // hexit 9 rounds down to 8
            // hexit b rounds  up  to c
            // hexit d rounds down to c
            // hexit f rounds  up  to 0

            round_up = !round_up;

            if (round_up) {
                expected += tc.step_flt;
            }
        }

        lock_guard guard{state.mut};
        state.bugs.insert(state.bugs.end(), local_bugs.begin(), local_bugs.end());
        ranges::sample(local_examples, back_insert_iterator{state.examples}, 1, state.urng);
    });

    ranges::sort(state.bugs);
    ranges::sort(state.examples);

    if (state.bugs.empty()) {
        printf("No bugs!\n");
    } else {
        printf("state.bugs.size(): %zu\n", state.bugs.size());

        vector<pair<double, float>> sample_bugs{state.bugs.front(), state.bugs.back()};
        ranges::sample(state.bugs, back_insert_iterator{sample_bugs}, 20, state.urng); // might sample front/back
        ranges::sort(sample_bugs);
        for (const auto& e : sample_bugs) {
            printf("Sample bug: %.6a should round to %.6a\n", e.first, e.second);
        }
    }

    for (const auto& e : state.examples) {
        printf("Example: %.6a should round to %.6a\n", e.first, e.second);
    }
}

C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /std:c++latest purr.cpp && purr
purr.cpp
state.bugs.size(): 7634944
Sample bug: 0x1.000005p+15 should round to 0x1.000004p+15
Sample bug: 0x1.016d75p+15 should round to 0x1.016d74p+15
[...]
Sample bug: 0x1.dd5e49p+16 should round to 0x1.dd5e48p+16
Sample bug: 0x1.fffffdp+16 should round to 0x1.fffffcp+16
Example: 0x1.e67dccp-129 should round to 0x1.e67dd0p-129
Example: 0x1.5b6b41p-127 should round to 0x1.5b6b40p-127
[...]
Example: 0x1.a19fe7p+126 should round to 0x1.a19fe8p+126
Example: 0x1.f0b0b5p+127 should round to 0x1.f0b0b4p+127

C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /std:c++latest /I D:\GitHub\STL\stl\inc purr.cpp && purr
purr.cpp
No bugs!
Example: 0x1.4fad5ap-128 should round to 0x1.4fad58p-128
Example: 0x1.99d311p-127 should round to 0x1.99d310p-127
[...]
Example: 0x1.c64aedp+126 should round to 0x1.c64aecp+126
Example: 0x1.2cbcd3p+127 should round to 0x1.2cbcd4p+127

I've taken the exhaustive test's output of sample bugs and unaffected examples, and added test cases accordingly. I believe that this is more than enough, and that we don't need to add randomized coverage of this space. I mechanically transformed the output into test cases, with manual adjustment for the subnormal example (since it's printed as 0x1.MEOW as a double, but we want to see 0x0.PURR to match the float representation). I also verified that the final test cases fail without the fix, and pass with the fix.

Finally, this removes _Make_big_integer_flt_u32(), _Make_big_integer_flt_u64(), and _Make_big_integer_flt_power_of_two() which were unused. Thanks to @statementreply for noticing them.

AlexGuteniev · 2021-12-02T01:23:50Z

STL/stl/inc/charconv

Lines 881 to 885 in 3c2fd04

    
           // This high-precision integer division implementation was translated from the implementation of
 
           // System.Numerics.BigIntegerBuilder.ModDivCore in the .NET Framework sources.
 
           // It computes both quotient and remainder: the remainder is stored in the _Numerator argument,
 
           // and the least significant 64 bits of the quotient are returned from the function.
 
           _NODISCARD inline uint64_t _Divide(_Big_integer_flt& _Numerator, const _Big_integer_flt& _Denominator) noexcept {

Is .NET affected by the issue?

tests/std/tests/P0067R5_charconv/float_from_chars_test_cases.hpp

statementreply · 2021-12-02T04:09:47Z

STL/stl/inc/charconv

Lines 439 to 463 in 369619f

    
           _NODISCARD inline _Big_integer_flt _Make_big_integer_flt_u32(const uint32_t _Value) noexcept {
 
               _Big_integer_flt _Xval{};
 
               _Xval._Mydata[0] = _Value;
 
               _Xval._Myused    = 1;
 
               return _Xval;
 
           }
 
           _NODISCARD inline _Big_integer_flt _Make_big_integer_flt_u64(const uint64_t _Value) noexcept {
 
               _Big_integer_flt _Xval{};
 
               _Xval._Mydata[0] = static_cast<uint32_t>(_Value);
 
               _Xval._Mydata[1] = static_cast<uint32_t>(_Value >> 32);
 
               _Xval._Myused    = _Xval._Mydata[1] == 0 ? 1u : 2u;
 
               return _Xval;
 
           }
 
           _NODISCARD inline _Big_integer_flt _Make_big_integer_flt_power_of_two(const uint32_t _Power) noexcept {
 
               const uint32_t _Element_index = _Power / _Big_integer_flt::_Element_bits;
 
               const uint32_t _Bit_index     = _Power % _Big_integer_flt::_Element_bits;
 
               _Big_integer_flt _Xval{};
 
               _CSTD memset(_Xval._Mydata, 0, _Element_index * sizeof(uint32_t));
 
               _Xval._Mydata[_Element_index] = 1u << _Bit_index;
 
               _Xval._Myused                 = _Element_index + 1;
 
               return _Xval;
 
           }

_Make_big_integer_flt_u32 and _Make_big_integer_flt_u64 don't properly set _Myused to 0 when _Value is 0.

These functions are not used anywhere in the repo. Should we fix the bug or just remove them?

CaseyCarter · 2021-12-02T17:32:29Z

_Make_big_integer_flt_u32 and _Make_big_integer_flt_u64 don't properly set _Myused to 0 when _Value is 0.
These functions are not used anywhere in the repo. Should we fix the bug or just remove them?

I'd prefer 🔥, with "fix and #if 0" a distant second place.

StephanTLavavej · 2021-12-04T02:18:49Z

@AlexGuteniev I don't think so, as this was specific to how we were testing for "is the remainder zero", and our _Big_integer_flt is a highly reduced version of the original. However, I don't know where the original sources are, so I can't say for sure.

@statementreply Great catch! They were originally used in the fallback codepath to print large integers exactly - I constructed a _Big_integer_flt and then divided by powers of 10 to extract digits. They became unused after internal MSVC-PR-140796 "Use long division to optimize chars_format::fixed." on 2018-09-11. In the modern codebase, we continue to use long division for float, but Ryu Printf for double. I'll remove them.

chrisd-work · 2021-12-07T06:56:03Z

I was the original poster of the bug on developer community. I have a set of tests that can test each individual floats rounding boundary. With this change, I could no longer find any issues.

StephanTLavavej · 2022-02-22T21:26:19Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

Fix GH 2363 "<charconv>: from_chars() incorrectly tiebreaks floats".

369619f

StephanTLavavej added the bug Something isn't working label Nov 30, 2021

StephanTLavavej requested a review from a team as a code owner November 30, 2021 11:40

CaseyCarter self-assigned this Dec 1, 2021

StephanTLavavej assigned mnatsuhara Dec 1, 2021

CaseyCarter approved these changes Dec 2, 2021

View reviewed changes

tests/std/tests/P0067R5_charconv/float_from_chars_test_cases.hpp Outdated Show resolved Hide resolved

CaseyCarter removed their assignment Dec 2, 2021

StephanTLavavej added 2 commits December 3, 2021 18:22

Remove newline.

11c6564

Remove unused functions.

c74fbe7

CaseyCarter approved these changes Dec 4, 2021

View reviewed changes

AlexGuteniev mentioned this pull request Feb 4, 2022

Move <ranges> and <format> into C++20 #2518

Merged

6 tasks

StephanTLavavej added the high priority Important! label Feb 8, 2022

mnatsuhara approved these changes Feb 22, 2022

View reviewed changes

mnatsuhara removed their assignment Feb 22, 2022

StephanTLavavej self-assigned this Feb 22, 2022

StephanTLavavej merged commit 89d50c5 into microsoft:main Feb 23, 2022

StephanTLavavej deleted the tiebreak_to_even branch February 23, 2022 00:12

StephanTLavavej mentioned this pull request Mar 29, 2022

<charconv>: _Divide can trim _Big_integer_flt more efficiently #2623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<charconv>`: Fix `from_chars()` `float` tiebreaking #2366

`<charconv>`: Fix `from_chars()` `float` tiebreaking #2366

StephanTLavavej commented Nov 30, 2021 •

edited

Loading

AlexGuteniev commented Dec 2, 2021

statementreply commented Dec 2, 2021

CaseyCarter commented Dec 2, 2021

StephanTLavavej commented Dec 4, 2021

chrisd-work commented Dec 7, 2021

StephanTLavavej commented Feb 22, 2022

	uint64_t _Fractional_mantissa = _Divide(_Fractional_numerator, _Fractional_denominator);

	_Has_zero_tail = _Has_zero_tail && _Fractional_numerator._Myused == 0;

	// This high-precision integer division implementation was translated from the implementation of
	// System.Numerics.BigIntegerBuilder.ModDivCore in the .NET Framework sources.
	// It computes both quotient and remainder: the remainder is stored in the _Numerator argument,
	// and the least significant 64 bits of the quotient are returned from the function.
	_NODISCARD inline uint64_t _Divide(_Big_integer_flt& _Numerator, const _Big_integer_flt& _Denominator) noexcept {

	_Numerator._Mydata[1] = static_cast<uint32_t>(_Uu >> 32);
	_Numerator._Mydata[0] = static_cast<uint32_t>(_Uu);
	_Numerator._Myused = _Numerator._Mydata[1] > 0 ? 2u : 1u;
	return _Quotient;

<charconv>: Fix from_chars() float tiebreaking #2366

<charconv>: Fix from_chars() float tiebreaking #2366

Conversation

StephanTLavavej commented Nov 30, 2021 • edited Loading

AlexGuteniev commented Dec 2, 2021

statementreply commented Dec 2, 2021

CaseyCarter commented Dec 2, 2021

StephanTLavavej commented Dec 4, 2021

chrisd-work commented Dec 7, 2021

StephanTLavavej commented Feb 22, 2022

`<charconv>`: Fix `from_chars()` `float` tiebreaking #2366

`<charconv>`: Fix `from_chars()` `float` tiebreaking #2366

StephanTLavavej commented Nov 30, 2021 •

edited

Loading