-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use -ffast-math or other unsafe math optimizations #24855
Conversation
Special shout out to @timoschwarzer for supplying the tight-loop fp benchmark code. |
c0173de
to
07ef013
Compare
Godot supports many different compilers and for production releases we have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers all do slightly different things with -ffast-math and it is causing issues now. See godotengine#24841, godotengine#24540, godotengine#10758, godotengine#10070. And probably other complaints about physics differences between release and release_debug builds. I've done some performance comparisons on Linux x86_64. All tests are ran 20 times. Bunnymark: (higher is better) (bunnies) min max stdev average fast-math 7332 7597 71 7432 this pr 7379 7779 108 7621 (102%) FPBench (gdscript port http://fpbench.org/) (lower is better) (ms) fast-math 15441 16127 192 15764 this pr 15671 16855 326 16001 (99%) Float_add (adding floats in a tight loop) (lower is better) (sec) fast-math 5.49 5.78 0.07 5.65 this pr 5.65 5.90 0.06 5.76 (98%) Float_div (dividing floats in a tight loop) (lower is better) (sec) fast-math 11.70 12.36 0.18 11.99 this pr 11.92 12.32 0.12 12.12 (99%) Float_mul (multiplying floats in a tight loop) (lower is better) (sec) fast-math 11.72 12.17 0.12 11.93 this pr 12.01 12.62 0.17 12.26 (97%) I have also looked at FPS numbers for tps-demo, 3d platformer, 2d platformer, and sponza and could not find any measurable difference. I believe that given the issues and oft-reported (physics) glitches on release builds I believe that the couple of percent of tight-loop floating point performance regression is well worth it. This fixes godotengine#24540 and fixes godotengine#24841
07ef013
to
e5b335d
Compare
Let's make a - a = 0 again! 🎉 |
Thanks! |
btw I had to delete the according *.pyc files for the change to become effective. |
Cherry-picked for 3.0.7. |
This change likely has a significant negative impact on Godot's performance for some users, at arbitrary times. The performance tests you've done don't tell the whole story: Removing Denormals occur when numbers get very close to 0; so depending on what code/scenes you're testing, you may have no denormals or you may have lots of them. When a user's code results in numbers very close to 0, especially in frequently used things like 3D or 2D transforms, this can easily tank Godot's performance. If some users have been complaining about arbitrary slow downs, the removal of this flag may have been the reason why. See https://en.wikipedia.org/wiki/Subnormal_number for more information on denormals. I strongly recommend either reintroducing the Here's a great StackOverflow answer on the topic: https://stackoverflow.com/a/9314926/1262685 |
Remember that in future Godot releases, we'd like to make the physics engine as deterministic as possible (though still not fully deterministic). The main reason for doing so is that it's the only way to have client-side prediction for server-side physics without causing visible errors to occur every so often. Floating-point operations need to be as consistent as possible across platforms and compilers to allow for this.
I haven't seen too many complaints of this style lately, especially since the BVH was merged in Godot 3.3 (which fixed performance issues present in the old octree culling). |
Disabling subnormal numbers (denormals) shouldn't cause any inconsistencies, as they -- by definition -- should occur in the exact same spots in all platforms that use the same floating point number format (i.e. IEEE 754) and be flushed to 0 in the same way, as there's only one way to flush to +0 (assuming all platforms have such a flush to +0 option). It's probably other optimizations that PS. In case there's no way to flush/round denormals to + or - 0 consistently on all platforms, then it would be very beneficial to have a "fast" build of Godot for users who don't plan on using server-side physics with client-side prediction. |
I looked into options of disabling just denormals with gcc/msvc/clang but there's no such option. However, it appears that Chromium has the same issue and came up with the following solution: https://chromium.googlesource.com/chromium/blink/+/refs/heads/main/Source/platform/audio/DenormalDisabler.h We can just import this into Godot, but do we have any testcases to validate this is a problem? Or at least some kind of synthetic benchmark that forces denormals? I've read up on how these are implemented in CPUs and I agree with @ArdaE that it is very likely a problem for a lot of users and hard to detect. I really, really, don't want to re-enable -ffast-math ever again but this might work? But I do want to be able to generate some test cases. |
Godot supports many different compilers and for production releases we
have to support 3 currently: GCC8, Clang6, and MSVC2017. These compilers
all do slightly different things with -ffast-math and it is causing
issues now. See #24841, #24540, #10758, #10070. And probably other
complaints about physics differences between release and release_debug
builds.
I've done some performance comparisons on Linux x86_64. All tests are
ran 20 times. And bunnymark really was 2% faster, that was not a copy/paste error :)
I have also looked at FPS numbers for tps-demo, 3d platformer, 2d
platformer, and sponza and could not find any measurable difference.
I believe that given the issues and oft-reported (physics) glitches on
release builds I believe that the couple of percent of tight-loop
floating point performance regression is well worth it.
This fixes #24540 and fixes #24841