Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.fmt.parseFloat does not parse 9007199254740993.0 correctly #11169

Closed
Tracked by #89
andrewrk opened this issue Mar 15, 2022 · 4 comments · Fixed by #11566
Closed
Tracked by #89

std.fmt.parseFloat does not parse 9007199254740993.0 correctly #11169

andrewrk opened this issue Mar 15, 2022 · 4 comments · Fixed by #11566
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@andrewrk
Copy link
Member

Implementation is here: https://github.com/ziglang/zig/blob/master/lib/std/fmt/parse_float.zig

--- a/lib/std/fmt/parse_float.zig
+++ b/lib/std/fmt/parse_float.zig
@@ -425,4 +425,6 @@ test "fmt.parseFloat" {
             try expect(approxEqAbs(T, try parseFloat(T, "2.71828182845904523536"), @as(T, 2.718281828459045), epsilon));
         }
     }
+
+    try expectEqual(try parseFloat(f128, "9007199254740993.0"), 9007199254740993.0);
 }
$ ./zig test ../lib/std/fmt/parse_float.zig 
Test [1/1] test "parseFloat"... expected 9.007199254740994e+15, found 9.007199254740992e+15
Test [1/1] test "parseFloat"... FAIL (TestExpectedEqual)

This is causing a behavior test failure in the self-hosted compiler, which relies on std.fmt.parseFloat:

try expect(9007199254740992.0 + 1.0 == 9007199254740993.0);

@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library. labels Mar 15, 2022
@andrewrk andrewrk added this to the 0.10.0 milestone Mar 15, 2022
@andrewrk andrewrk added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Mar 15, 2022
@curtiswilkinson
Copy link
Contributor

I've done a little bit of digging:

This code was ported from https://github.com/grzegorz-kraszewski/stringtofloat

When I run that in C on our example float string I'm getting the 4 at the end that we're seeing with the zig port (vs the 2 from stage1, which I believe uses a port of musl for float parsing?).

int main() {
  printf("%fn\n", str2dbl("9007199254740993.0"));
  return 0;
};

Screenshot 2022-03-15

This could potentially be what the important comment in the C reference implementation is talking about: https://github.com/grzegorz-kraszewski/stringtofloat/blob/master/str2dbl.c#L7

/* The code works in "round towards zero" mode. This is different from  */
/* GCC standard library strtod(), which uses "round half to even" rule. */
/* Therefore it cannot be used as a direct drop-in replacement, as in   */
/* some cases results will be different on the least significant bit of */
/* mantissa. Read more in the README.md file.

@andrewrk
Copy link
Member Author

Here's the best path forward in my opinion:

  • ditch grzegorz-kraszewski/stringtofloat, forget about it
  • port musl's f128 parsing code. I already did half the work here: parse_f128.c, it just needs to be ported to zig

@danielchasehooper
Copy link
Contributor

I'm working on this now. Unable to assign it to myself.

@matu3ba
Copy link
Contributor

matu3ba commented Mar 17, 2022

@danielchasehooper I have a wip though I should have used more vim macros for porting to get it over finish line:
https://gist.github.com/matu3ba/040ef3ad23f53213ad277999eb847586

Feel free to continue or rip out what you need.

tiehuis added a commit to tiehuis/zig that referenced this issue May 2, 2022
The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.

Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.

* Accuracy

The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.

* Performance

Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).

** Old

    $ time ./test_all_fxx_data
    3520298/5296694 succeeded (1776396 fail)

    ________________________________________________________
    Executed in   28.68 secs    fish           external
       usr time   28.48 secs    0.00 micros   28.48 secs
       sys time    0.08 secs  694.00 micros    0.08 secs

** This Implementation

    $ time ./test_all_fxx_data
    5296693/5296694 succeeded (1 fail)

    ________________________________________________________
    Executed in    4.54 secs    fish           external
       usr time    4.37 secs  515.00 micros    4.37 secs
       sys time    0.10 secs  171.00 micros    0.10 secs

Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:

https://github.com/tiehuis/zig-parsefloat/blob/0d9f020f1a37ca88bf889703b397c1c41779f090/PERFORMANCE.md#commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706

In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.

* F128 Support

Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.

* Behavioural Changes

There are a few behavioural changes to note.

 - `parseHexFloat` is now redundant and these are now supported directly
   in `parseFloat`.
 - We implement round-to-even in all parsing routines. This is as
   specified by IEEE-754. Previous code used different rounding
   mechanisms (standard was round-to-zero, hex-parsing looked to use
   round-up) so there may be subtle differences.

Closes ziglang#2207.
Fixes ziglang#11169.
tiehuis added a commit to tiehuis/zig that referenced this issue May 2, 2022
The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.

Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.

* Accuracy

The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.

* Performance

Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).

** Old

    $ time ./test_all_fxx_data
    3520298/5296694 succeeded (1776396 fail)

    ________________________________________________________
    Executed in   28.68 secs    fish           external
       usr time   28.48 secs    0.00 micros   28.48 secs
       sys time    0.08 secs  694.00 micros    0.08 secs

** This Implementation

    $ time ./test_all_fxx_data
    5296693/5296694 succeeded (1 fail)

    ________________________________________________________
    Executed in    4.54 secs    fish           external
       usr time    4.37 secs  515.00 micros    4.37 secs
       sys time    0.10 secs  171.00 micros    0.10 secs

Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:

https://github.com/tiehuis/zig-parsefloat/blob/0d9f020f1a37ca88bf889703b397c1c41779f090/PERFORMANCE.md#commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706

In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.

* F128 Support

Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.

* Behavioural Changes

There are a few behavioural changes to note.

 - `parseHexFloat` is now redundant and these are now supported directly
   in `parseFloat`.
 - We implement round-to-even in all parsing routines. This is as
   specified by IEEE-754. Previous code used different rounding
   mechanisms (standard was round-to-zero, hex-parsing looked to use
   round-up) so there may be subtle differences.

Closes ziglang#2207.
Fixes ziglang#11169.
tiehuis added a commit to tiehuis/zig that referenced this issue May 3, 2022
The previous float-parsing method was lacking in a lot of areas. This
commit introduces a state-of-the art implementation that is both
accurate and fast to std.

Code is derived from working repo https://github.com/tiehuis/zig-parsefloat.
This includes more test-cases and performance numbers that are present
in this commit.

* Accuracy

The primary testing regime has been using test-data found at
https://github.com/tiehuis/parse-number-fxx-test-data. This is a fork of
upstream with support for f128 test-cases added. This data has been
verified against other independent implementations and represents
accurate round-to-even IEEE-754 floating point semantics.

* Performance

Compared to the existing parseFloat implementation there is ~5-10x
performance improvement using the above corpus. (f128 parsing excluded
in below measurements).

** Old

    $ time ./test_all_fxx_data
    3520298/5296694 succeeded (1776396 fail)

    ________________________________________________________
    Executed in   28.68 secs    fish           external
       usr time   28.48 secs    0.00 micros   28.48 secs
       sys time    0.08 secs  694.00 micros    0.08 secs

** This Implementation

    $ time ./test_all_fxx_data
    5296693/5296694 succeeded (1 fail)

    ________________________________________________________
    Executed in    4.54 secs    fish           external
       usr time    4.37 secs  515.00 micros    4.37 secs
       sys time    0.10 secs  171.00 micros    0.10 secs

Further performance numbers can be seen using the
https://github.com/tiehuis/simple_fastfloat_benchmark/ repository, which
compares against some other well-known string-to-float conversion
functions. A breakdown can be found here:

https://github.com/tiehuis/zig-parsefloat/blob/0d9f020f1a37ca88bf889703b397c1c41779f090/PERFORMANCE.md#commit-b15406a0d2e18b50a4b62fceb5a6a3bb60ca5706

In summary, we are within 20% of the C++ reference implementation and
have about ~600-700MB/s throughput on a Intel I5-6500 3.5Ghz.

* F128 Support

Finally, f128 is now completely supported with full accuracy. This does
use a slower path which is possible to improve in future.

* Behavioural Changes

There are a few behavioural changes to note.

 - `parseHexFloat` is now redundant and these are now supported directly
   in `parseFloat`.
 - We implement round-to-even in all parsing routines. This is as
   specified by IEEE-754. Previous code used different rounding
   mechanisms (standard was round-to-zero, hex-parsing looked to use
   round-up) so there may be subtle differences.

Closes ziglang#2207.
Fixes ziglang#11169.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. frontend Tokenization, parsing, AstGen, Sema, and Liveness. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
4 participants