Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use -ffast-math or other unsafe math optimizations #24855

Merged
merged 1 commit into from
Jan 9, 2019

Conversation

hpvb
Copy link
Member

@hpvb hpvb commented Jan 9, 2019

Godot supports many different compilers and for production releases we
have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers
all do slightly different things with -ffast-math and it is causing
issues now. See #24841, #24540, #10758, #10070. And probably other
complaints about physics differences between release and release_debug
builds.

I've done some performance comparisons on Linux x86_64. All tests are
ran 20 times. And bunnymark really was 2% faster, that was not a copy/paste error :)

Bunnymark: (higher is better)
(bunnies)    min    max  stdev average
fast-math   7332   7597    71     7432
this pr        7379   7779   108     7621 (102%)

FPBench (gdscript port http://fpbench.org/) (lower is better)
(ms)
fast-math  15441  16127   192    15764
this pr       15671  16855   326    16001  (99%)

Float_add (adding floats in a tight loop) (lower is better)
(sec)
fast-math   5.49   5.78  0.07     5.65
this pr        5.65   5.90  0.06     5.76  (98%)

Float_div (dividing floats in a tight loop) (lower is better)
(sec)
fast-math  11.70  12.36  0.18    11.99
this pr       11.92  12.32  0.12    12.12  (99%)

Float_mul (multiplying floats in a tight loop) (lower is better)
(sec)
fast-math  11.72  12.17  0.12    11.93
this pr       12.01  12.62  0.17    12.26  (97%)

I have also looked at FPS numbers for tps-demo, 3d platformer, 2d
platformer, and sponza and could not find any measurable difference.

I believe that given the issues and oft-reported (physics) glitches on
release builds I believe that the couple of percent of tight-loop
floating point performance regression is well worth it.

This fixes #24540 and fixes #24841

@hpvb
Copy link
Member Author

hpvb commented Jan 9, 2019

Special shout out to @timoschwarzer for supplying the tight-loop fp benchmark code.

@hpvb hpvb force-pushed the disable-ffast-math branch 2 times, most recently from c0173de to 07ef013 Compare January 9, 2019 00:57
Godot supports many different compilers and for production releases we
have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers
all do slightly different things with -ffast-math and it is causing
issues now. See godotengine#24841, godotengine#24540, godotengine#10758, godotengine#10070. And probably other
complaints about physics differences between release and release_debug
builds.

I've done some performance comparisons on Linux x86_64. All tests are
ran 20 times.

Bunnymark: (higher is better)
(bunnies)    min    max  stdev average
fast-math   7332   7597    71     7432
this pr     7379   7779   108     7621 (102%)

FPBench (gdscript port http://fpbench.org/) (lower is better)
(ms)
fast-math  15441  16127   192    15764
this pr    15671  16855   326    16001  (99%)

Float_add (adding floats in a tight loop) (lower is better)
(sec)
fast-math   5.49   5.78  0.07     5.65
this pr     5.65   5.90  0.06     5.76  (98%)

Float_div (dividing floats in a tight loop) (lower is better)
(sec)
fast-math  11.70  12.36  0.18    11.99
this pr    11.92  12.32  0.12    12.12  (99%)

Float_mul (multiplying floats in a tight loop) (lower is better)
(sec)
fast-math  11.72  12.17  0.12    11.93
this pr    12.01  12.62  0.17    12.26  (97%)

I have also looked at FPS numbers for tps-demo, 3d platformer, 2d
platformer, and sponza and could not find any measurable difference.

I believe that given the issues and oft-reported (physics) glitches on
release builds I believe that the couple of percent of tight-loop
floating point performance regression is well worth it.

This fixes godotengine#24540 and fixes godotengine#24841
@timoschwarzer
Copy link
Contributor

Let's make a - a = 0 again! 🎉

@akien-mga akien-mga merged commit 08fe57d into godotengine:master Jan 9, 2019
@akien-mga
Copy link
Member

Thanks!

@timoschwarzer
Copy link
Contributor

btw I had to delete the according *.pyc files for the change to become effective.

@akien-mga
Copy link
Member

Cherry-picked for 3.0.7.

@ArdaE
Copy link
Contributor

ArdaE commented Nov 30, 2021

This change likely has a significant negative impact on Godot's performance for some users, at arbitrary times. The performance tests you've done don't tell the whole story: Removing -ffast-math reintroduces full support for denormals, and denormals can have a huge performance impact on many processors (including modern ones from Intel). Whether this impacts a project or not would depend on the numbers involved, which in turn depends on many factors including the locations and orientations of game objects at runtime.

Denormals occur when numbers get very close to 0; so depending on what code/scenes you're testing, you may have no denormals or you may have lots of them. When a user's code results in numbers very close to 0, especially in frequently used things like 3D or 2D transforms, this can easily tank Godot's performance. If some users have been complaining about arbitrary slow downs, the removal of this flag may have been the reason why. See https://en.wikipedia.org/wiki/Subnormal_number for more information on denormals.

I strongly recommend either reintroducing the fast-math flag, or if that's really not desired, to manually disable support just for the denormals. The latter may be desirable as fast-math does more than just disable denormals. Either way, considering Godot is a game engine, I don't think it should need the type of precision that denormals add in the rare cases where they play a role. If/when higher precision is desired, going with doubles would go a much longer way.

Here's a great StackOverflow answer on the topic: https://stackoverflow.com/a/9314926/1262685
(In the question this answers, the poster inadvertently causes numbers to get very close to 0, and is rightfully puzzled why adding 0.1 to the numbers makes his/her code about 10 times faster).

@Calinou
Copy link
Member

Calinou commented Nov 30, 2021

Remember that in future Godot releases, we'd like to make the physics engine as deterministic as possible (though still not fully deterministic). The main reason for doing so is that it's the only way to have client-side prediction for server-side physics without causing visible errors to occur every so often. Floating-point operations need to be as consistent as possible across platforms and compilers to allow for this.

If some users have been complaining about arbitrary slow downs, the removal of this flag may have been the reason why.

I haven't seen too many complaints of this style lately, especially since the BVH was merged in Godot 3.3 (which fixed performance issues present in the old octree culling).

@ArdaE
Copy link
Contributor

ArdaE commented Nov 30, 2021

... we'd like to make the physics engine as deterministic as possible. The main reason for doing so is that it's the only way to have client-side prediction for server-side physics without causing visible errors to occur every so often. Floating-point operations need to be as consistent as possible across platforms and compilers to allow for this.

Disabling subnormal numbers (denormals) shouldn't cause any inconsistencies, as they -- by definition -- should occur in the exact same spots in all platforms that use the same floating point number format (i.e. IEEE 754) and be flushed to 0 in the same way, as there's only one way to flush to +0 (assuming all platforms have such a flush to +0 option). It's probably other optimizations that -ffast_math enables that was the source of any observed inconsistencies.

PS. In case there's no way to flush/round denormals to + or - 0 consistently on all platforms, then it would be very beneficial to have a "fast" build of Godot for users who don't plan on using server-side physics with client-side prediction.

@hpvb
Copy link
Member Author

hpvb commented Jan 2, 2023

I looked into options of disabling just denormals with gcc/msvc/clang but there's no such option. However, it appears that Chromium has the same issue and came up with the following solution: https://chromium.googlesource.com/chromium/blink/+/refs/heads/main/Source/platform/audio/DenormalDisabler.h

We can just import this into Godot, but do we have any testcases to validate this is a problem? Or at least some kind of synthetic benchmark that forces denormals? I've read up on how these are implemented in CPUs and I agree with @ArdaE that it is very likely a problem for a lot of users and hard to detect.

I really, really, don't want to re-enable -ffast-math ever again but this might work? But I do want to be able to generate some test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants