Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of LTO configuration for all targets, and its impact on build time, build size, and performance #96851

Open
akien-mga opened this issue Sep 11, 2024 · 10 comments

Comments

@akien-mga
Copy link
Member

akien-mga commented Sep 11, 2024

For years we've operated under the assumption that LTO (Link Time Optimization) is a net positive for production builds as it would:

  • Increase performance (notably up to 20% in the GDScript VM with GCC LTO)
  • Reduce build size

The drawback is much longer build times, hence why it's only used for production builds/official releases.

Now findings in #96785 suggest that the reduction in build size is only true for GCC's LTO, and not for LLVM LTO (whether "full" LTO -flto or ThinLTO -flto=thin). With LLVM LTO there's a significant size increase for platforms we tested so far (Web, Android, Linux) of up to +15%. For the Web (currently using LTO for official builds) and Android (not using it for now) this is significant.

So it's time we do a thorough review of build flags for all targets and compilers and make sure we're actually using the best configuration possible for official builds.

I'll post successive replies for each Godot target platform so we can use these posts (maintainers are welcome to edit my posts) to keep track of metrics and findings for each platform individually. If that turns out to be too unwieldy we can fork this issue in one issue per platform, but I expect we'll find closely related behavior across platforms who share a compiler toolchain (GCC, LLVM, MSVC).

@godotengine/buildsystem @godotengine/android @godotengine/ios @godotengine/linux-bsd @godotengine/macos @godotengine/web @godotengine/windows

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

Android

Toolchains:

  • Android NDK (LLVM)

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

iOS

Toolchains:

  • Xcode (LLVM)

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

Linux

Toolchains:

  • GCC (official builds toolchain)
  • LLVM

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

macOS

Toolchains:

  • Xcode (LLVM)
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.6.0

Release template

scons target=template_release arch=arm64 platform=macos production=yes lto=*
LTO Build Time Peak memory usage Executable size
none 7:10 sub 1G 68.082.840
thin 9:45 ~ 2.5G 74.674.240
full 19:26 ~ 12G 1 66.936.680

Debug template

scons target=template_debug arch=arm64 platform=macos production=yes lto=*
LTO Build Time Peak memory usage Executable size
none 9:51 sub 1G 71.861.672
thin 13:28 ~ 2.5G 84.408.856
full 42:52 2 ~ 18G 74.752.334

Footnotes

  1. Mostly around 6G with a spike at the end of linking.

  2. A lot of swap usage, so time is not directly comparable.

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

Web

Toolchains:

  • Emscripten (LLVM)

@akien-mga
Copy link
Member Author

akien-mga commented Sep 11, 2024

Windows

Toolchains:

  • MSVC cl.exe (MSVC)

  • MSVC clang-cl.exe (LLVM)

Release template

scons target=template_release production=yes use_llvm=yes lto=*
LTO Build Time Executable size
none 04:09.81 58,270,720
thin 04:46.66 68,056,064
full N/A1 N/A

Debug template

scons target=template_debug production=yes use_llvm=yes lto=*
LTO Build Time Executable size
none 04:12.88 73,480,704
thin 05:03.34 86,334,976
full N/A N/A

  • mingw-gcc (GCC) (official builds toolchain for x86_64 / x86_32)

  • llvm-mingw (LLVM) (official builds toolchain for arm64)

Release template

scons target=template_release production=yes use_llvm=yes use_mingw=yes lto=*
LTO Build Time Executable size
none 04:36.49 63,627,776
thin 04:55.96 73,898,496
full 14:38.60 70,736,896

Debug template

scons target=template_debug production=yes use_llvm=yes use_mingw=yes lto=*
LTO Build Time Executable size
none 04:37.85 68,219,392
thin 05:14.51 79,650,304
full 15:46.82 76,373,504

Footnotes

  1. Attempted to build for ~20 minutes before erroring out.

@lawnjelly
Copy link
Member

Something to bear in mind with LTO :

SCU builds will likely get the lions share of the benefit, without needing LTO. This is because they push a bunch of files into the same translation unit, which means that the compiler can optimize across cpps (which afaik is what LTO offers, the more convoluted way around).

We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).

@Calinou
Copy link
Member

Calinou commented Sep 16, 2024

We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).

Using SCU builds for fully optimized release builds can need a lot of RAM (I've measured 22 GB for the build process alone on Linux x86_64), so this is to keep in mind. That said, the release build server has plenty of RAM to spare.

@dustdfg
Copy link
Contributor

dustdfg commented Oct 31, 2024

SCU builds will likely get the lions share of the benefit, without needing LTO. This is because they push a bunch of files into the same translation unit, which means that the compiler can optimize across cpps (which afaik is what LTO offers, the more convoluted way around).

scons target="editor" use_llvm="yes" lto="none"
I've just ran two builds. One with SCU and another without. Both with LLVM and without LTO.

  • SCU build: 121,314,392 bytes
  • Non SCU build: 120,724,664 bytes

Performance impact: ??? (Didn't test)
Size difference: ~6KB

Godot's SCU is not one creating one big file from all the files but just gluing files into bigger files but still produces many files not one big... Not to mention that lots of files are build as usually even with SCU build.

At the same time LTO is performed on final executable (on "all" the files at once). So in general SCU can't compete with LTO. While SCU possible gives some performance impact I think it is negligible though I didn't test performace

@dustdfg
Copy link
Contributor

dustdfg commented Oct 31, 2024

We so far haven't used them in production, but it's worth mentioning as an alternative (no idea one how their size compares in release, or performance).

Using SCU builds for fully optimized release builds can need a lot of RAM (I've measured 22 GB for the build process alone on Linux x86_64), so this is to keep in mind. That said, the release build server has plenty of RAM to spare.

I have a low-end device so I have only 4 threads. RAM usage greatly depends on amount of parallel threads. I saw peaks at 6GB with SCU (part of it is firefox ~1.4GB)

If you are going to build with SCU only release builds provided to user (I mean end user who compiles custom template for game). I think it is enough bearable to use less threads to use less RAM. So if SCU can really give impact, it'd be reasonable to mention SCU as a tool for optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment