-
Notifications
You must be signed in to change notification settings - Fork 200
Benchmarks
All the following benchmarks have been executed on an AMD Ryzen Threadripper 1950X @3.4GHz (32 virtual cores), on a machine with 32 GB RAM. All runtime benchmarks below are single-threaded (Fruit is not multithreaded, even though it's reentrant and thread-safe), so the number of cores doesn't matter for the runtime performance benchmarks. But it does matter for the compile-time benchmarks.
The following compiler options were used: -std=c++11 -O2 -DNDEBUG
. Benchmarks involving Boost.DI used -std=c++14
instead, since Boost.DI requires at least C++14.
The benchmarks were executed on Kubuntu Linux 19.10 with the following compilers:
- Clang 10.0.0
- GCC 9.2.1
All values use 95% confidence intervals and have been rounded to 2 significant digits. Each benchmark is repeated at least 3 times, and then again until one of the following:
- 20 runs of the benchmark
- 2h runtime (for this benchmark alone)
- the bounds of the confidence interval round to the same number (with 2 significant digits' precision).
When you see e.g. 5.3 in a result it means the confidence interval [5.3, 5.3] (i.e., both bounds rounded to that number, when keeping only 2 significant digits of precision). Proper confidence intervals are written as e.g. 5.2-5.5, meaning the confidence interval [5.2, 5.5].
All benchmarks try to simulate what happens in a dummy codebase using dependency injection, to give an overall idea (as opposed to synthetic benchmarks, where a large % difference might be misleading because that part is only a small % of the total run/compile time).
The dummy codebases are defined as follows: 10% of the classes have no dependencies, and 90% have 10 dependencies each (i.e. each of them needed 10 instances of other objects from the injector to be constructed). When not otherwise specified, there are also interfaces (i.e. pure virtual base classes) for all injected classes. So e.g. for codebases with 100 classes, the dependency graph has 10 classes with no dependencies and 90 classes with 10 dependencies each, for a total of 900 edges between classes; and there are 100 interfaces with 100 interface-implementation edges in the injection graph.
In addition to Fruit, the "contestants" for these benchmarks are:
- boost-experimental/di. This will be referred to as Boost.DI below for conciseness (and this is also how the author of that library refers to it) but note that this is not actually part of Boost, the author has pushed for its inclusion for several years but it has never been accepted into Boost. Using "stars on github" as a rough measure of popularity, Boost.DI is the 2nd most popular DI framework, behind Fruit (graph).
- "Simple DI": a codebase with dependency injection but without using any DI framework. Unlike the others, this codebase uses all concrete classes (instead of using interfaces). All classes are allocated on the stack (instead of e.g. using new). This is included to show the pros/cons of this bare-bones approach compared to other no-DI-framework approaches and to Fruit.
- "Simple DI w/ interfaces": similar to the previous, but using interfaces. All classes are still allocated on the stack. This is included to see the pros/cons of introducing interfaces (compare the values with the ones for "Simple DI").
- "Simple DI w/ interfaces and new/delete": similar to the previous, but using
new
to allocate classes on the heap. This is included to see the pros/cons of allocating on the heap, and the pros/cons of using Fruit instead of not using a DI framework.
The 3 "Simple DI" codebases are meant as successive steps towards the Fruit model, with Fruit as the last step of the progression.
These benchmarks show the time to compile the codebase from scratch, using make with N+1 jobs (where N is the number of virtual cores available). Note that these benchmarks compile with optimizations (-O2), the compile time without optimization would of course be lower.
Since Fruit does most injection checks at compile-time using template metaprogramming, in some sense a part of it "runs" at compile time too. The same applies to Boost.DI.
Compile time (Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 6.6-6.7 s | 16 s | 64 s |
Boost.DI | 17 s | 58 s | 580 s |
Simple DI | 0.93-0.94 s | 1.9 s | 8.3 s |
Simple DI w/ interfaces | 0.97-0.98 s | 1.9 s | 6.9 s |
Simple DI w/ interfaces, new/delete | 2.5 s | 5.8 s | 25 s |
Compile time (GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 5.1 s | 12 s | 48 s |
Boost.DI | 18 s | 110 s | N/A |
Simple DI | 0.73 s | 1.5 s | 6.4 s |
Simple DI w/ interfaces | 0.82 s | 1.8 s | 8.8-8.9 s |
Simple DI w/ interfaces, new/delete | 2.1-2.2 s | 5.2-5.3 s | 33 s |
Key takeaways:
- Adopting Fruit adds about 4-5s of cold compilation time in a codebase with 100 classes and about 40-60s in a codebase with 1000 classes.
- The slowdown will be additive, so while the relative time difference in the table above is huge, that won't be the case in a real codebase that already takes a long time without Fruit. E.g. in a codebase with 250 injected classes that currently takes 5min to compile, you should expect a cold compile time slowdown on the order of 10-15s, not of 10x.
- The compilation time with Boost.DI is much higher than Fruit (even in a small codebase) and the gap becomes even larger in larger codebases. The slowdown in medium/large codebases could be a non-trivial fraction of the compile time without DI (if not a multiple).
- The data for the combination Boost.DI+GCC+1000 classes is not available because GCC crashes (AFAICT due to it running out of memory). As shown in the tables below, Boost.DI doesn't scale in terms of compile time memory either.
The scenario for these benchmarks is as follows: starting from an already-compiled codebase, we touch 5 random files and then re-run make. This is meant to simulate the compilation cost in an edit-rerun cycle, as part of development. Any high values here slow down engineers working on the project, much more than high cold compile times would (since incremental compilations are much more frequent than cold ones).
Incremental compile time (Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 3.9 s | 4 s | 6.1 s |
Boost.DI | 16 s | 57 s | 570-580 s |
Simple DI | 0.84-0.85 s | 1.8 s | 8 s |
Simple DI w/ interfaces | 0.65-0.66 s | 0.67-0.69 s | 1.9 s |
Simple DI w/ interfaces, new/delete | 2.2 s | 4.6 s | 20 s |
Incremental compile time (GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 2.9 s | 3-3.1 s | 5.2 s |
Boost.DI | 17 s | 110 s | N/A |
Simple DI | 0.67-0.68 s | 1.5 s | 6.3 s |
Simple DI w/ interfaces | 0.58-0.6 s | 0.89-0.91 s | 5.2 s |
Simple DI w/ interfaces, new/delete | 1.9 s | 4.4 s | 29 s |
Key takeaways:
- Switching to Fruit doesn't cause significant increases on the incremental compile time; in fact, in larger codebases the incremental compilation time is lower with Fruit even when compared with some "no DI framework" approaches. This is because the increased modularity of the codebase allows to re-compile fewer things, and because the sizes of the various compilation units are more balanced, instead of having 1 file that is very slow to compile (main.cpp) compared to the rest.
- As in the previous benchmarks, Boost.DI incremental compile times are huge and increase significantly with the size of the codebase. A 10min incremental compile time overhead for a codebase with 1000 classes would likely cause a significant slowdown in the development and waste many engineer/hours.
These benchmarks do a cold build of the codebase, but instead of measuring the compilation time they measure the maximum amount of RAM needed by the various steps of the build (including both compilation and linking). This is an important metric because it determines how much RAM the developers working on the project need to use the full compilation speed allowed by their processor, or how much they need to scale down the parallelism to make the compilation fit in the available RAM.
Compile memory (Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 133 MB | 133 MB | 209 MB |
Boost.DI | 343 MB | 772 MB | 4196 MB |
Simple DI | 89 MB | 94 MB | 114 MB |
Simple DI w/ interfaces | 93 MB | 104 MB | 162 MB |
Simple DI w/ interfaces, new/delete | 181 MB | 324 MB | 1049 MB |
Compile memory (GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 152 MB | 152 MB | 228 MB |
Boost.DI | 572 MB | 1430 MB | 7534 MB |
Simple DI | 70 MB | 85 MB | 162 MB |
Simple DI w/ interfaces | 75-76 MB | 104 MB | 286 MB |
Simple DI w/ interfaces, new/delete | 305 MB | 572 MB | 2193 MB |
Key takeaways:
- Adding Fruit increases the RAM requirements per process by at most 95MB in the worst comparison (codebase of 1000 classes compiled with Clang, comparing Fruit vs "Simple DI"). This will likely be dwarfed by the amount of additional memory needed to compile your actual code.
- "Simple DI w/ interfaces" is roughly on par with Fruit; Fruit takes a bit less memory when using GCC and a bit more when using Clang.
- "Simple DI w/ interfaces, new/delete" requires a lot of RAM; this is because there's 1 compilation unit (main.cpp) that's much larger than the rest.
- Boost.DI takes significantly more memory than Fruit (>3x even in the smallest codebase) and as in previous compile time benchmarks it doesn't scale; the gap becomes larger with the codebase size, using up several GBs in the large codebase.
This is the first of the runtime benchmarks. The scenario is as follows: the main process of the example codebase starts up, creates an injector and injects all classes, then prints "Hello, world!" and terminates.
This is meant to show the overhead of using a DI framework compared to another or to the "Simple DI" approaches (i.e. with no DI framework).
Startup time (Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 7.3 ms | 6.6 ms | 9.2 ms |
Boost.DI | 5.7-5.8 ms | 6.2 ms | 8.4 ms |
Simple DI | 5.4 ms | 6.8 ms | 7.1 ms |
Simple DI w/ interfaces | 5.4 ms | 6.2 ms | 6.5 ms |
Simple DI w/ interfaces, new/delete | 5.4 ms | 7 ms | 6.3 ms |
Startup time (GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 5.8 ms | 6.5 ms | 9.2 ms |
Boost.DI | 4.5 ms | 5.3 ms | N/A |
Simple DI | 5.3 ms | 5.9 ms | 5.8 ms |
Simple DI w/ interfaces | 5.4 ms | 6.4 ms | 6.9 ms |
Simple DI w/ interfaces, new/delete | 5.4 ms | 6.1 ms | 6.6 ms |
Key takeaways:
- The Fruit startup time overhead is at most 3.4ms in the least favorable comparison (GCC, 1000 classes, comparing with "Simple DI"), which should be dwarfed by the actual startup time of virtually any real-world application of that size.
- The overheads here should be additive, not multiplicative. So if you have a server binary that doesn't currently use Fruit and takes 1s to start, adopting Fruit will add around 2-3ms to that, which would be hardly noticeable (and not e.g. 20%).
- Unlike the compile time performance, Boost.DI is actually competitive here, about 1-2ms faster than Fruit. As mentioned in the previous section, unless your binary is extremely fast to start and startup time is critical, the extra 1-2ms should not be an issue.
What if you want to create many injectors during the lifetime of a process? For example, you might want to create 1 injector per request in a RPC/HTTP server, so that you can store request-specific state in the injected objects but still guaranteeing that there's no interference between requests (and without needing to guard access to this data with locks to allow threads processing different requests to access/modify it).
This is the scenario for this section: after the process has started (paying the startup costs mentioned in the previous section) then we repeatedly create an injector, inject all classes with it and destroy it (serially, there is no parallelism here).
The Fruit codebase in this case uses fruit::NormalizedComponent
to pre-compute data at startup. The cost of doing this is similar than (in fact, slightly lower than) the cost for creating an injector, so you can still refer to the startup times in the previous section to have an idea of the startup time overhead of Fruit.
Boost.DI does not offer comparable functionality (at the time of writing), so the example codebase there creates injectors from scratch each time.
Per-request time (Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 2.5-2.6 μs | 8-8.1 μs | 85 μs |
Boost.DI | 47-50 μs | 150 μs | 710-740 μs |
Simple DI | 0.81-0.83 μs | 2.2 μs | 8.2-8.3 μs |
Simple DI w/ interfaces | 1-1.1 μs | 2.8 μs | 13 μs |
Simple DI w/ interfaces, new/delete | 2.2 μs | 5.5 μs | 47 μs |
Per-request time (GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 2.9-3 μs | 12 μs | 99-100 μs |
Boost.DI | 53-55 μs | 140-150 μs | N/A |
Simple DI | 0.44 μs | 1.2 μs | 4.6-4.7 μs |
Simple DI w/ interfaces | 0.65-0.66 μs | 1.7-1.8 μs | 12 μs |
Simple DI w/ interfaces, new/delete | 2-2.1 μs | 5-5.1 μs | 43 μs |
Key takeaways:
- Creating a Fruit injector per request is quite cheap, the worst overhead here is about 0.1ms (when comparing Fruit to "Simple DI" with GCC and 1000 classes)
- This benchmark assumes that all 1000 classes have to be injected for each request; in practice, each request will only need a subset of the classes. You can reduce the cost of using Fruit by using fruit::Provider<> to lazily inject classes. If you use this and you end up injecting only e.g. 100 classes/request on average (even if your entire codebase consists of 1000 classes) you'll probably see a slowdown comparable to the 2-3μs for the codebase with 100 classes.
- The time increases super-linearly with the number of classes (e.g. when going from 250 classes to 1000, the number of classes to inject is 4x but the time increases by more than 4x). This is because the code for the constructors/destructors of 1000 classes no longer fits in the same level of cache as the code for 250 classes would, causing additional cache misses.
- Boost.DI is 10-15x slower than Fruit in all cases.
The following tables show the size of the executable generated in the various codebases.
Executable size (stripped, Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 390 KB | 947 KB | 3710 KB |
Boost.DI | 576 KB | 1464 KB | 5664 KB |
Simple DI | 30 KB | 54 KB | 195 KB |
Simple DI w/ interfaces | 66 KB | 156 KB | 585 KB |
Simple DI w/ interfaces, new/delete | 70 KB | 166 KB | 625 KB |
Executable size (stripped, GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 390 KB | 957 KB | 3808 KB |
Boost.DI | 761 KB | 2050 KB | N/A |
Simple DI | 34 KB | 70 KB | 253 KB |
Simple DI w/ interfaces | 98 KB | 224 KB | 869 KB |
Simple DI w/ interfaces, new/delete | 107 KB | 244 KB | 927 KB |
Executable size (stripped, no exceptions/RTTI, Clang) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 322 KB | 791 KB | 3125 KB |
Boost.DI | 458 KB | 1074 KB | 4492 KB |
Simple DI | 30 KB | 54 KB | 195 KB |
Simple DI w/ interfaces | 59 KB | 126 KB | 478 KB |
Simple DI w/ interfaces, new/delete | 59 KB | 126 KB | 488 KB |
Executable size (stripped, no exceptions/RTTI, GCC) | 100 classes | 250 classes | 1000 classes |
---|---|---|---|
Fruit | 302 KB | 732 KB | 2832 KB |
Boost.DI | 673 KB | 1855 KB | N/A |
Simple DI | 34 KB | 70 KB | 253 KB |
Simple DI w/ interfaces | 70 KB | 166 KB | 654 KB |
Simple DI w/ interfaces, new/delete | 70 KB | 166 KB | 634 KB |
Key takeaways:
- The executable size overhead of using Fruit is around 3-4KB per injected class. In a real codebase, this will likely be dwarfed by the size of the actual code.
- As expected, disabling exceptions and RTTI leads to a decrease in the executable size. If you're concerned about executable size, you should probably do this in your release build, while keeping those on in debug builds (at least RTTI, that allows Fruit to report richer error messages containing type names and function signatures).
- The executable size overhead of Boost.DI is about 30% higher than Fruit with Clang, and about 2-3x with GCC.