Don't use -ffast-math or other unsafe math optimizations #24855

hpvb · 2019-01-09T00:36:48Z

Godot supports many different compilers and for production releases we
have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers
all do slightly different things with -ffast-math and it is causing
issues now. See #24841, #24540, #10758, #10070. And probably other
complaints about physics differences between release and release_debug
builds.

I've done some performance comparisons on Linux x86_64. All tests are
ran 20 times. And bunnymark really was 2% faster, that was not a copy/paste error :)

Bunnymark: (higher is better)
(bunnies)    min    max  stdev average
fast-math   7332   7597    71     7432
this pr        7379   7779   108     7621 (102%)

FPBench (gdscript port http://fpbench.org/) (lower is better)
(ms)
fast-math  15441  16127   192    15764
this pr       15671  16855   326    16001  (99%)

Float_add (adding floats in a tight loop) (lower is better)
(sec)
fast-math   5.49   5.78  0.07     5.65
this pr        5.65   5.90  0.06     5.76  (98%)

Float_div (dividing floats in a tight loop) (lower is better)
(sec)
fast-math  11.70  12.36  0.18    11.99
this pr       11.92  12.32  0.12    12.12  (99%)

Float_mul (multiplying floats in a tight loop) (lower is better)
(sec)
fast-math  11.72  12.17  0.12    11.93
this pr       12.01  12.62  0.17    12.26  (97%)

I have also looked at FPS numbers for tps-demo, 3d platformer, 2d
platformer, and sponza and could not find any measurable difference.

I believe that given the issues and oft-reported (physics) glitches on
release builds I believe that the couple of percent of tight-loop
floating point performance regression is well worth it.

This fixes #24540 and fixes #24841

hpvb · 2019-01-09T00:38:12Z

Special shout out to @timoschwarzer for supplying the tight-loop fp benchmark code.

Godot supports many different compilers and for production releases we have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers all do slightly different things with -ffast-math and it is causing issues now. See godotengine#24841, godotengine#24540, godotengine#10758, godotengine#10070. And probably other complaints about physics differences between release and release_debug builds. I've done some performance comparisons on Linux x86_64. All tests are ran 20 times. Bunnymark: (higher is better) (bunnies) min max stdev average fast-math 7332 7597 71 7432 this pr 7379 7779 108 7621 (102%) FPBench (gdscript port http://fpbench.org/) (lower is better) (ms) fast-math 15441 16127 192 15764 this pr 15671 16855 326 16001 (99%) Float_add (adding floats in a tight loop) (lower is better) (sec) fast-math 5.49 5.78 0.07 5.65 this pr 5.65 5.90 0.06 5.76 (98%) Float_div (dividing floats in a tight loop) (lower is better) (sec) fast-math 11.70 12.36 0.18 11.99 this pr 11.92 12.32 0.12 12.12 (99%) Float_mul (multiplying floats in a tight loop) (lower is better) (sec) fast-math 11.72 12.17 0.12 11.93 this pr 12.01 12.62 0.17 12.26 (97%) I have also looked at FPS numbers for tps-demo, 3d platformer, 2d platformer, and sponza and could not find any measurable difference. I believe that given the issues and oft-reported (physics) glitches on release builds I believe that the couple of percent of tight-loop floating point performance regression is well worth it. This fixes godotengine#24540 and fixes godotengine#24841

timoschwarzer · 2019-01-09T09:28:37Z

Let's make a - a = 0 again! 🎉

akien-mga · 2019-01-09T09:49:48Z

Thanks!

timoschwarzer · 2019-01-09T10:09:33Z

btw I had to delete the according *.pyc files for the change to become effective.

akien-mga · 2019-07-03T13:54:15Z

Cherry-picked for 3.0.7.

ArdaE · 2021-11-30T07:49:54Z

This change likely has a significant negative impact on Godot's performance for some users, at arbitrary times. The performance tests you've done don't tell the whole story: Removing -ffast-math reintroduces full support for denormals, and denormals can have a huge performance impact on many processors (including modern ones from Intel). Whether this impacts a project or not would depend on the numbers involved, which in turn depends on many factors including the locations and orientations of game objects at runtime.

Denormals occur when numbers get very close to 0; so depending on what code/scenes you're testing, you may have no denormals or you may have lots of them. When a user's code results in numbers very close to 0, especially in frequently used things like 3D or 2D transforms, this can easily tank Godot's performance. If some users have been complaining about arbitrary slow downs, the removal of this flag may have been the reason why. See https://en.wikipedia.org/wiki/Subnormal_number for more information on denormals.

I strongly recommend either reintroducing the fast-math flag, or if that's really not desired, to manually disable support just for the denormals. The latter may be desirable as fast-math does more than just disable denormals. Either way, considering Godot is a game engine, I don't think it should need the type of precision that denormals add in the rare cases where they play a role. If/when higher precision is desired, going with doubles would go a much longer way.

Here's a great StackOverflow answer on the topic: https://stackoverflow.com/a/9314926/1262685
(In the question this answers, the poster inadvertently causes numbers to get very close to 0, and is rightfully puzzled why adding 0.1 to the numbers makes his/her code about 10 times faster).

Calinou · 2021-11-30T17:39:27Z

Remember that in future Godot releases, we'd like to make the physics engine as deterministic as possible (though still not fully deterministic). The main reason for doing so is that it's the only way to have client-side prediction for server-side physics without causing visible errors to occur every so often. Floating-point operations need to be as consistent as possible across platforms and compilers to allow for this.

If some users have been complaining about arbitrary slow downs, the removal of this flag may have been the reason why.

I haven't seen too many complaints of this style lately, especially since the BVH was merged in Godot 3.3 (which fixed performance issues present in the old octree culling).

ArdaE · 2021-11-30T17:55:49Z

... we'd like to make the physics engine as deterministic as possible. The main reason for doing so is that it's the only way to have client-side prediction for server-side physics without causing visible errors to occur every so often. Floating-point operations need to be as consistent as possible across platforms and compilers to allow for this.

Disabling subnormal numbers (denormals) shouldn't cause any inconsistencies, as they -- by definition -- should occur in the exact same spots in all platforms that use the same floating point number format (i.e. IEEE 754) and be flushed to 0 in the same way, as there's only one way to flush to +0 (assuming all platforms have such a flush to +0 option). It's probably other optimizations that -ffast_math enables that was the source of any observed inconsistencies.

PS. In case there's no way to flush/round denormals to + or - 0 consistently on all platforms, then it would be very beneficial to have a "fast" build of Godot for users who don't plan on using server-side physics with client-side prediction.

hpvb · 2023-01-02T22:41:07Z

I looked into options of disabling just denormals with gcc/msvc/clang but there's no such option. However, it appears that Chromium has the same issue and came up with the following solution: https://chromium.googlesource.com/chromium/blink/+/refs/heads/main/Source/platform/audio/DenormalDisabler.h

We can just import this into Godot, but do we have any testcases to validate this is a problem? Or at least some kind of synthetic benchmark that forces denormals? I've read up on how these are implemented in CPUs and I agree with @ArdaE that it is very likely a problem for a lot of users and hard to detect.

I really, really, don't want to re-enable -ffast-math ever again but this might work? But I do want to be able to generate some test cases.

hpvb added bug topic:buildsystem cherrypick labels Jan 9, 2019

hpvb added this to the 3.1 milestone Jan 9, 2019

hpvb force-pushed the disable-ffast-math branch 2 times, most recently from c0173de to 07ef013 Compare January 9, 2019 00:57

hpvb force-pushed the disable-ffast-math branch from 07ef013 to e5b335d Compare January 9, 2019 01:07

akien-mga merged commit 08fe57d into godotengine:master Jan 9, 2019

akien-mga mentioned this pull request Jan 19, 2019

Vector.length() of some tiny numbers results in -inf when compiled with Clang #16117

Closed

akien-mga removed the cherrypick:3.0 label Jul 3, 2019

akien-mga mentioned this pull request Sep 23, 2022

Review use of -fomit-frame-pointer and -ftree-vectorize flags for optimized builds #66296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use -ffast-math or other unsafe math optimizations #24855

Don't use -ffast-math or other unsafe math optimizations #24855

hpvb commented Jan 9, 2019 •

edited

Loading

hpvb commented Jan 9, 2019

timoschwarzer commented Jan 9, 2019

akien-mga commented Jan 9, 2019

timoschwarzer commented Jan 9, 2019

akien-mga commented Jul 3, 2019

ArdaE commented Nov 30, 2021 •

edited

Loading

Calinou commented Nov 30, 2021 •

edited

Loading

ArdaE commented Nov 30, 2021 •

edited

Loading

hpvb commented Jan 2, 2023

Don't use -ffast-math or other unsafe math optimizations #24855

Don't use -ffast-math or other unsafe math optimizations #24855

Conversation

hpvb commented Jan 9, 2019 • edited Loading

hpvb commented Jan 9, 2019

timoschwarzer commented Jan 9, 2019

akien-mga commented Jan 9, 2019

timoschwarzer commented Jan 9, 2019

akien-mga commented Jul 3, 2019

ArdaE commented Nov 30, 2021 • edited Loading

Calinou commented Nov 30, 2021 • edited Loading

ArdaE commented Nov 30, 2021 • edited Loading

hpvb commented Jan 2, 2023

hpvb commented Jan 9, 2019 •

edited

Loading

ArdaE commented Nov 30, 2021 •

edited

Loading

Calinou commented Nov 30, 2021 •

edited

Loading

ArdaE commented Nov 30, 2021 •

edited

Loading