Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for single compilation unit builds #77170

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 17, 2023

AKA Unity / Jumbo builds.

Adds support for simple SCU build. This speeds up compilation by compiling multiple cpp files within a single translation unit.

Master version of #69405 #78113

This is an optional extra build that can be selected from the Scons command line.

  • scu_build=none (default)
  • scu_build=dev (DEV_ENABLED only)
  • scu_build=all (all builds use SCU)

Regular builds work as normal.

FAQ

  • SCU builds are TOTALLY OPTIONAL.
    If you wish you can totally ignore the existence of this build, and forget they exist.
  • SCU builds are primarily for contributors, to speed up iteration time during development by reducing compilation time. This is particularly useful for people who would prefer to be productive.

Introduction

Especially relevant for those that have not used this type of build before. SCU builds (aka "unity" or "jumbo" builds) are a widely used technique for speeding up compilation. https://en.wikipedia.org/wiki/Single_compilation_unit

In a vanilla build one cpp file is fed to the compiler each time (one translation unit). In suboptimal setups, each cpp can end up including a large chain of header files, and might easily result in tens of thousands of lines of code to compile, for a cpp that may only be small.

If we consider that the majority of time is spent compiling headers that have a "once" directive (or include guards) then we can speed up compilation dramatically by compiling several cpps in the same translation unit. We can do this by supplying a master cpp file (called a "scu" file here) which is simply a list of the multiple cpp files we wish to include in the translation unit.

Thus for instance:
scu_drivers_gles2.gen.cpp contains:

#include "../rasterizer_canvas_base_gles2.cpp"
#include "../rasterizer_canvas_gles2.cpp"
#include "../rasterizer_gles2.cpp"
#include "../rasterizer_scene_gles2.cpp"
#include "../rasterizer_storage_gles2.cpp"
#include "../shader_compiler_gles2.cpp"
#include "../shader_gles2.cpp"

Any headers are compiled once and shared between all the cpps in the scu file.

This typically greatly speeds up compilation.

In return there are several downsides:

SCU builds rely on not polluting the global namespace

If variables / functions within the cpps in the translation unit are duplicates, you will get build conflicts. These must be resolved (or better avoided in the first place), or else excluded from the accelerated build (and use old style compilation).

In general, polluting the global namespace is now regarded as a faux pas in modern programming, so this helps, but you will still find old school code that abuses global namespace, and relies on single cpp as a primitive form of scoping. Many of our third party libraries are not suitable without modification.

Touching any file within a scu translation unit will cause the whole translation unit to recompile

Vanilla builds touching a single cpp are often cheaper. But generally touching a header (which is the rate limiting step) will be much cheaper in SCU builds.

Description

The PR contains 3 elements:

  1. New functionality in scu_builders.py which is called from SConstruct (if the scu_build=dev / all Scons option is set).

This code builds all the necessary scu_*.gen.cpp files in corresponding scu subfolders that are necessary for the build to work. scu_builders.py should rarely need modifying, and only is concerned with keeping the scu build working, the regular builds are unaffected.

Each folder that is converted is stored in a list (_scu_folders) and passed to methods.py, so each folder can be checked during compilation in stage (2) to decide whether to build using SCU files or regular files.

  1. The add_source_files() command in SCsub files is now a wrapper which will attempt to call add_source_files_scu() which will compile SCU files instead of regular source files if these exist (i.e. we are in a SCU build, and the folder has been converted in scu_builders.py). If this fails, it reverts to the old add_source_files() functionality.

add_source_files() now returns True if it compiles using SCU files (or False for regular build). This return value allows the possibility of combining several folders into the same scu translation unit (which is more efficient). If this is desired it should be done by adding the extra folders in the master list in scu_builders.py.

Most SCsub files are unmodified, the wrapper will use the SCU files automatically.

  1. Modification of a limited amount of the Godot source code where there are naming conflicts that would occur when compiling files in the same translation unit. This is usually due to conflicts within global namespace - static functions defined only within cpp files are often culprits.

Alternatively, these files can be excluded from the scu build in the master list in scu_builders.py, which will avoid the need for changing any source code. However this will result in slower SCU builds.

Instructions

Instructions for users (contributors)

Instructions for users (contributors)

  1. Add scu_build=dev or scu_build=all to your Scons args.
  2. Make sure to compile with a non-scu (regular) build prior to submitting PRs. This is because SCU builds by nature include headers from earlier cpps in the translation unit, therefore won't catch all the includes you will need in a regular build. The CI will catch these errors but it will usually be faster to catch them on a local build on your machine.
    That's it. 😃

When a SCU build is active it automatically runs scu_builders.py to generate the scu folders and scu generated files for different modules, and can auto-sense when you add / remove files from the source tree. It is rerun each time Scons is invoked, because if you are adding or removing files from the monitored folders the scu files must be regenerated (this process is quite fast, and is only invoked during a SCU build).

Instructions for regular build maintainers / contributors adding new folders

Instructions for regular build maintainers / contributors adding new folders

Most SCsub files can be unmodified. Prefer to allow the add_source_files() to do the globbing (via e.g. *.cpp) rather than doing the globbing beforehand and passing a list of files to add_source_files(). This is because the SCU build can only accelerate when the wildcard is passed to add_source_files().

Instructions for SCU build maintenance

Instructions for SCU build maintenance

  1. Periodically a SCU build maintainer (presumably myself at least to start with) will make any small modifications to ensure the SCU build is optimizing as many folders as possible (every few months).
  2. Occasionally PRs are expected to break the SCU build. This will be things like naming conflicts. This won't affect regular builds, but there will be occasional fixup PRs to either:
  • Correct the naming conflict in the source code
  • Exclude the files concerned from the SCU build (in scu_builders.py, these files can be marked as exceptions in the master list)

The master list contains entries like the following:

process_folder(["scene/3d"], ["baked_lightmap"], 32, "cpp")

The arguments are:

  1. List of folders to be included in the translation unit. The first folder will be the folder that a scu subfolder is added to, and will determine the filename of the scu files. In most cases only a single folder will be in the list. Any more entries must be subfolders, relative to the first.
  2. List of exceptions, filenames that will not be included in the scu translation unit. They will be compiled as separate cpps. This will prevent the speedup, but easily allows dealing with files that contain naming conflicts.
  3. Maximum number of cpp files to place in one translation unit. This will default to 0 (unlimited), but can be set explicitly to lower the number, which will decrease the memory use by the compiler process.
  4. Extension. Usually cpp, but can be c for e.g. third party libraries.

Only the first argument is required, the rest have defaults, and are rarely used.

It is highly likely I would maintain the SCU build to start with, but it is fairly simple to do, so any maintainer with an interest can keep it running well.

Limitations

SCU builds can use a lot more memory for each compiler instance, particularly in release builds. The memory use depends on the amount of cpp files included. While this can be varied manually in the scu_builders.py file, it is hard coded to a lower value in release builds (currently 8) to prevent using up all available memory on smaller PCs.

Some benchmarks

I originally converted a lot more of the engine, but this version mainly involves just Godot core for simplicity. As such, full rebuilds aren't as fast as they can be, but touching important headers results in quite a bit faster compilation.

(Intel i5-7500T, 2.7ghz x 4, Linux Mint 20, SSD. i.e. low end CPU)

Full rebuild (DEV_ENABLED)

Normal build 15mins 04
SCU build 7mins 09

Touching node.h (DEV_ENABLED)

Normal build 7min 20
SCU build 2min 23

Notes

  • I'm not an expert in python by a long shot so if we do decide to roll out, this needs some good checking / fixing up of any python / Scons script, and discussion of any problems as it does create and delete files / folders, and that has the potential to be destructive, so needs good peer review.
  • I've been using this locally in 3.x for 6 months, it usually speeds up builds by around 3x.

@lawnjelly lawnjelly added this to the 4.x milestone May 17, 2023
@lawnjelly lawnjelly force-pushed the scu_build_master branch 5 times, most recently from 13b0f74 to 053190d Compare May 22, 2023 06:45
@Calinou
Copy link
Member

Calinou commented May 22, 2023

DEV builds only currently, because the compile instances use a lot more memory. This is ok in debug builds, but in optimized builds clang at least can use more than 2Gb so could run out of memory depending on the PC.

Could use_scu=yes be allowed for non-dev builds? As long as the documentation warns about high memory usage, it's probably fine. I have 64 GB of RAM, so I'm pretty sure it'd do just fine on my PC (or even on most setups with 32 GB of RAM, which is increasingly common among developers).

@lawnjelly lawnjelly changed the title [WIP] [4.x] Single Compilation Unit build. [4.x] Single Compilation Unit build. May 22, 2023
@lawnjelly lawnjelly marked this pull request as ready for review May 22, 2023 07:19
@lawnjelly lawnjelly requested review from a team as code owners May 22, 2023 07:19
@lawnjelly
Copy link
Member Author

Could use_scu=yes be allowed for non-dev builds? I have 64 GB of RAM, so I'm pretty sure it'd do just fine on my PC.

Potentially, but it's a bit risky (at least to start with) unless specifically requested by the user, as lock ups due to lack of memory can be quite nasty and difficult to recover from (I only have 8gb, but -j4 was enough to freeze my linux machine).

I did build in a method to ameliorate the problem actually, when I was trying to get release to work. The RAM use for the compile instance was worse for the largest modules, such as editor, scene/2d and scene/3d. In the scu_builders.py file, where it chunks up the various modules, the process_folder() command third argument is the number of scu files to split the module into. I'm only using this currently for the editor which is split into two scu files instead of the default 1, as it is difficult to justify the extra RAM required for probably a second difference to the compile time.

We could potentially split into more chunks when building release, to get some benefit, so if we were going to allow use in release builds I'd prefer to add some steps to help prevent problems.

But the other issue is that release SCU builds are not directly comparable with regular builds. They may produce a smaller (or larger) binary, and may be a bit faster or slower. SCU builds essentially get you most of "whole program optimization" for free, because the entire module is in one translation unit, and optimization can take place between cpp files. This may produce different results to "whole program optimization".

@lawnjelly
Copy link
Member Author

Feel free to take a different approach to fix those, I'm not very skilled in handling the C++ global namespace cleanly.

Simplest way off the top of my head was just a class with a couple of static functions, there's no need for a singleton as these aren't bound. The editor scale is just a glorified global, and having the functions enclosed in a class scope (EditorScale) seemed a little neater than global functions, it gets rid of the chance of name collisions.

This is nothing to get too worked up about though, it is more a "matter of principle" in this case. Typically we often end up grouping together a bunch of globals, as we did with e.g. core_globals.h. I don't know if there is other editor global state somewhere else that editor scale can be combined with.

The only question was whether using EditorScale::set_scale() is compat breaking for modules etc, if so we can add a function as before, but maybe only the core will be actually setting the scale (rather than reading it).

editor/editor_scale.cpp Outdated Show resolved Hide resolved
editor/editor_scale.cpp Outdated Show resolved Hide resolved
editor/plugins/visual_shader_editor_plugin.cpp Outdated Show resolved Hide resolved
methods.py Outdated Show resolved Hide resolved
methods.py Outdated Show resolved Hide resolved
modules/gltf/SCsub Outdated Show resolved Hide resolved
Comment on lines 10 to 11
# or if not a scu build, the above line only adds the main folder,
# and the subfolders are added separately:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, does add_source_files(..., "*.cpp") and add_source_files_scu(..., "*.cpp") behave differently to expand the wildcard?

If so that's not a very good design IMO, as it's error prone and confusing. "*.cpp" should normally expand only to files in the current folder, not recursively through subfolders.

It's also bug prone - if that's what you're doing here, you're adding the editor/*.cpp files twice, and compiling them in non-editor builds too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight thing here incorrect, I have changed it to:

# Either add all the folders in one go if a SCU build...
if not env_gltf.add_source_files_scu(env.modules_sources, "*.cpp"):
    # or if not a scu build, the above line only adds the main folder,
    # and the subfolders are added separately:
    env_gltf.add_source_files(env.modules_sources, "structures/*.cpp")

I admit it could be rather confusing, am trying to think of a better way.
add_source_files_scu() only takes one thing from the parameter, the extension (c or cpp). The actual files that are added are determined by process_folder() in the scu_builders.py file.

The reason it is confusing is because the granularity of SCsub files (usually one per folder) is too high for efficient compilation. Instead in process_folder() we can add the main folder AND sub folders into a single SCU file.

This compiles more efficiently, but on the flip side, it means that if the SCU build is not active, it must manually add the source files (or call another SCsub). This is done by returning false from add_source_files_scu().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside the confusion, I think this is a blocker for this approach anyway:

It's also bug prone - if that's what you're doing here, you're adding the editor/*.cpp files twice, and compiling them in non-editor builds too.

We need to be able to include code conditionally.

Copy link
Member Author

@lawnjelly lawnjelly May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i didn't make clear above:

In this PR, in all cases except:

  • modules/gltf
  • modules/gltf/extensions

There is no recursion involved to subfolders, it only compiles the files in the folder. The system has the ability to go into further folders, which is currently only used in this PR for these two cases. If desired, this feature can be removed. It will reduce the gains to compile time (particularly in later modules PRs), but that trade off may be worth it if it makes the system easier to understand.

EDIT: I'll remove these two special cases, as it will be easier to understand, and will make little difference to the build time. This is something that can always be revisited at a later date, once people are more familiar with the build.

add_source_files_scu() is actually a "dumb" command. It essentially just goes into the relevant scu folder and compiles all the files it finds. The actual decision as to which files to be compiled is made in the scu_builders.py, which maps out all this stuff. The add_source_files_scu() command is to maintain compatibility with the non-scu build.

In essence the problem this mechanism solves is that, particularly in third party, folders are used for organisation, but a small number of files in a folder is inefficient. We don't want to be compiling 1 or 2 files at a time, we want to be compiling 30 or 40 files at a time.

you're adding the editor/*.cpp files twice, and compiling them in non-editor builds too.

The function is not recursive so I don't think this is currently occurring (unless I've made a mistake! 😁 )

We need to be able to include code conditionally.

This works just as usual at the level of SCsub. We tend to include / exclude by folders according to build, but if files within a folder are to included conditionally, the easiest option is to remove that folder from the SCU build by removing any process_folder() line, and calling the regular add_source_files() function.

(SCU build only makes sense to use on folders that are low hanging fruit, there's no need to apply it in situations where it is more hassle than it is worth. Indeed that's why I've left out third party stuff here because core is much easier to understand.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that add_source_files_scu() might not be needed at all then, if everything can be controlled from scu_builders.py. You could just leave the normal add_source_files() call, but have the scu build enable a flag in that method that bypasses adding the sources at all, since scu_builders.py already handles it.

Like scu_builders.py registers some global with the files it already handled, and add_source_files() checks this and skips the matching files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup it's possible, I did try and evaluate this when I first wrote the system, I'll have another go. If env["use_scu"] is set, it could check a hash table or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

scene/animation/animation_tree.h Outdated Show resolved Hide resolved
Comment on lines +261 to +267
process_folder(["core"])
process_folder(["core/crypto"])
process_folder(["core/debugger"])
process_folder(["core/extension"])
process_folder(["core/input"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to list all of those manually, when we already call add_source_files_scu directly in each of those SCsubs?

It would be better if we could keep the responsibility to register stuff for the SCU build where the files are added to the build, instead of in a helper script.

Copy link
Member Author

@lawnjelly lawnjelly May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially do that, but my aim was to keep everything as simple as possible in the SCsub files.

Essentially, for those not using SCU builds, my aim was that the impact should be as minimal as possible so that you e.g. don't need to understand SCU builds in order to editor the SCsub files (keep the cognitive load as low as possible). In many cases the only change to the SCsub is using the add_source_files_scu() function. Adding the unnecessary details into the SCsub is likely to put off people modifying it, and could lead to resentment.

Additionally, most of the maintenance of the SCU build can be kept to scu_builders.py (e.g. file exclusions) which keeps the commits / diffs easier to understand.

So we can take this approach, but I'm not sure whether it would offer significant advantages in return for the disadvantages.

Note also that while scu_builders.py is quite simple currently (in this PR), once we add modules etc it will become considerably more complex, and that is a lot to push into the SCsub files.

@lawnjelly lawnjelly force-pushed the scu_build_master branch 2 times, most recently from d3725aa to 5eadf94 Compare May 25, 2023 11:41
@lawnjelly lawnjelly force-pushed the scu_build_master branch 3 times, most recently from 66ae769 to 7a1ba4a Compare June 6, 2023 11:16
@akien-mga
Copy link
Member

akien-mga commented Jun 6, 2023

Tested on Windows 10 with VS 2019, using SCons directly (so not a VS project).

It's working great!

MSVC behaves weirdly when linking incremental builds, I found that it takes exponentially more time each time you build, unless you nuke the files in bin/ before building. My tests below outline this. That's not related to this PR as it happens both with and without SCU.


Predator Helios 500
AMD Ryzen 7 2700 8-Core Processor, 3200 MHz, 16 GB RAM
Windows 10 Build 19044
VS 2019 16.11.26 / MSVC 14.2
Python 3.11.3 / SCons 4.5.2 / calling from git-bash.exe

SCU enabled

py -m SCons dev_build=yes scu_build=all

Scratch build

3m22.175s

Big incremental rebuild (change to scene/main/node.h)

Without deleting bin (first try): 2m14.719s
Without deleting bin (second try): 2m9.989s
After deleting bin (first try): 1m14.995s
After deleting bin (second try): 1m15.605s

Small incremental rebuild (change to main/main.cpp)

0m16.680s

Null incremental rebuild (no change)

0m12.348s

SCU disabled

py -m SCons dev_build=yes scu_build=none

Scratch build

6m42.166s

Big incremental rebuild (change to scene/main/node.h)

Without deleting bin (first try): 9m17.832s (!)
Without deleting bin (second try): 12m56.707s (!!)
After deleting bin: 3m27.323s

Small incremental rebuild (change to main/main.cpp)

0m18.781s

Null incremental rebuild (no change)

0m15.417s

@akien-mga
Copy link
Member

On my Linux system, I had to do this change to be able to use SCU, which seems to behave weirdly when it comes from resolving __file__ compared to what some others are experiencing on other distros:

diff --git a/methods.py b/methods.py
index daffc1054e..2c15c4a43c 100644
--- a/methods.py
+++ b/methods.py
@@ -10,7 +10,7 @@ from pathlib import Path
 from os.path import normpath, basename
 
 # Get the "Godot" folder name ahead of time
-base_folder_path = str(Path(__file__).parent) + "/"
+base_folder_path = str(os.path.abspath(Path(__file__).parent)) + "/"
 base_folder_only = os.path.basename(os.path.normpath(base_folder_path))
 # Listing all the folders we have converted
 # for SCU in scu_builders.py

We initially debugged that this might be specific to using Pyston on my main dev machine running Mageia 9, but I've tested on another running Mageia 8 and not using Pyston, and needed to do the same change for the SCU code to work.

Adds support for simple SCU build (DEV_ENABLED only).
This speeds up compilation by compiling multiple cpp files within a single translation unit.
@lawnjelly
Copy link
Member Author

On my Linux system, I had to do this change to be able to use SCU, which seems to behave weirdly when it comes from resolving file compared to what some others are experiencing on other distros:

Great, I've force pushed with this change. As far as I understand it, python hasn't standardized how __file__ should work very well, and it varies on different versions, some with relative, some absolute paths, so by doing this we standardize on absolute path (which is what we were after).

@akien-mga
Copy link
Member

Testing results on Linux (Mageia 8) on the same Predator Helios 500 as for my Windows 10 test results above.
There I also used mold as linker to speed things up further.

System:    Host: helios Kernel: 5.15.110-desktop-2.mga8 x86_64 bits: 64 Desktop: KDE Plasma 5.20.4 Distro: Mageia 8 mga8 
CPU:       Info: 8-Core model: AMD Ryzen 7 2700 bits: 64 type: MT MCP L2 cache: 4 MiB 
           Speed: 1718 MHz min/max: 1550/3200 MHz Core speeds (MHz): 1: 1718 2: 1844 3: 1750 4: 1716 5: 3774 6: 2456 7: 2274 
           8: 3241 9: 1426 10: 1388 11: 1382 12: 1500 13: 1498 14: 1619 15: 3467 16: 2379 

GCC 10.4.0 / mold 1.11.0
Python 3.8.14 / SCons 4.5.2

SCU enabled

scons dev_build=yes scu_build=all linker=mold

Scratch build: 4m16,567s
node.h rebuild: 1m52,247s
main.cpp rebuild: 0m9,842s
null rebuild: 0m4,867s

SCU disabled

scons dev_build=yes scu_build=none linker=mold

Scratch build: 8m2,311s
node.h rebuild: 4m4,712s
main.cpp rebuild: 0m10,975s
null rebuild: 0m7,099s

@YuriSizov YuriSizov modified the milestones: 4.x, 4.1 Jun 6, 2023
@YuriSizov YuriSizov changed the title [4.x] Single Compilation Unit build. Single Compilation Unit build. Jun 6, 2023
@YuriSizov YuriSizov changed the title Single Compilation Unit build. Add support for single compilation unit builds Jun 6, 2023
@akien-mga
Copy link
Member

And test results on my main Linux laptop, running Mageia 9, with a less powerful Intel i7 8th gen:

System:
  Host: cauldron Kernel: 6.3.5-desktop-1.mga9 arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.27.5 Distro: Mageia 9
CPU:
  Info: quad core model: Intel Core i7-8705G bits: 64 type: MT MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 3100 min/max: 800/4100 cores: 1: 3101 2: 3100 3: 3100
    4: 3100 5: 3100 6: 3100 7: 3100 8: 3100

GCC 12.3.0
Python 3.10.11 / SCons 4.5.2

SCU enabled

scons scu_build=all dev_build=yes

Scratch build: 9m30,235s
node.h rebuild: 3m30,960s
main.cpp rebuild (GNU ld / bfd): 0m25,138s
main.cpp rebuild (mold): 0m11,572s
null rebuild: 0m03,407s

SCU disabled

scons scu_build=none dev_build=yes

Scratch build: 16m58,469s
node.h rebuild: 8m33,904s
main.cpp rebuild: 0m37,035s
null rebuild: 0m5,306s

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good to go. I tested it on Windows and Linux and it's working great, with more than 50% speedup for both scratch builds and incremental rebuilds.

The Python code could use a cleanup pass from an expert (poke @touilleMan ;)) but it's functional, and the SCU specific changes are quite self contained so it doesn't affect users who don't enable the option.

@akien-mga akien-mga merged commit 46c4068 into godotengine:master Jun 6, 2023
@akien-mga
Copy link
Member

Thanks! This will be a great time (and energy!) saver. Looking forward to SCU-ifying more of the remaining folders where it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants