-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation of LTO configuration for all targets, and its impact on build time, build size, and performance #96851
Comments
AndroidToolchains:
|
iOSToolchains:
|
LinuxToolchains:
|
macOSToolchains:
Release template
Debug template
Footnotes |
WebToolchains:
|
WindowsToolchains:
Release template
Debug template
Release template
Debug template
Footnotes
|
Something to bear in mind with LTO : SCU builds will likely get the lions share of the benefit, without needing LTO. This is because they push a bunch of files into the same translation unit, which means that the compiler can optimize across cpps (which afaik is what LTO offers, the more convoluted way around). We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance). |
Using SCU builds for fully optimized release builds can need a lot of RAM (I've measured 22 GB for the build process alone on Linux x86_64), so this is to keep in mind. That said, the release build server has plenty of RAM to spare. |
Performance impact: ??? (Didn't test) Godot's SCU is not one creating one big file from all the files but just gluing files into bigger files but still produces many files not one big... Not to mention that lots of files are build as usually even with SCU build. At the same time LTO is performed on final executable (on "all" the files at once). So in general SCU can't compete with LTO. While SCU possible gives some performance impact I think it is negligible though I didn't test performace |
I have a low-end device so I have only 4 threads. RAM usage greatly depends on amount of parallel threads. I saw peaks at 6GB with SCU (part of it is firefox ~1.4GB) If you are going to build with SCU only release builds provided to user (I mean end user who compiles custom template for game). I think it is enough bearable to use less threads to use less RAM. So if SCU can really give impact, it'd be reasonable to mention SCU as a tool for optimization |
For years we've operated under the assumption that LTO (Link Time Optimization) is a net positive for production builds as it would:
The drawback is much longer build times, hence why it's only used for production builds/official releases.
Now findings in #96785 suggest that the reduction in build size is only true for GCC's LTO, and not for LLVM LTO (whether "full" LTO
-flto
or ThinLTO-flto=thin
). With LLVM LTO there's a significant size increase for platforms we tested so far (Web, Android, Linux) of up to +15%. For the Web (currently using LTO for official builds) and Android (not using it for now) this is significant.So it's time we do a thorough review of build flags for all targets and compilers and make sure we're actually using the best configuration possible for official builds.
I'll post successive replies for each Godot target platform so we can use these posts (maintainers are welcome to edit my posts) to keep track of metrics and findings for each platform individually. If that turns out to be too unwieldy we can fork this issue in one issue per platform, but I expect we'll find closely related behavior across platforms who share a compiler toolchain (GCC, LLVM, MSVC).
@godotengine/buildsystem @godotengine/android @godotengine/ios @godotengine/linux-bsd @godotengine/macos @godotengine/web @godotengine/windows
The text was updated successfully, but these errors were encountered: