Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

irieger · 2024-08-10T16:37:07Z

Describe the bug
I recently upgraded my C++ projects unit tests by adding a JUnit reporter to use the resulting XML reports in gitlab for test result reports. While the gitlab part works well, I discovered, that adding the JUnit reporter increases the runtime of some of the tests in a really massive way. What normally takes < 10s including all the overhead of managing my environment in the gitlab executor takes - depending on the platform - up to around 4 minutes. The only change being the additional --reporter "JUnit::out=/tmp/catch2-xml-report".

Edit:
Additionally, the memory usage also explodes as I just found out. My CI test process, as I was extending my test cases, just "exploded". Meaning it was killed due to memory usage. I found hints to use GNU time to measure not only time but memory information. From what I got the interesting line is the following, first without the JUnit reporter, second with (this is from my actual project):

Maximum resident set size (kbytes): 1678916
Maximum resident set size (kbytes): 29153956

For the sample project:

... % /usr/bin/time -v ./sample_test                                      [cpp-playground/catch2-speed-sample/build/Release]
Randomness seeded to: 1267125391
===============================================================================
All tests passed (62914560 assertions in 1 test case)

	Command being timed: "./sample_test"
	User time (seconds): 1.06
	System time (seconds): 0.03
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 249876
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 294
	Voluntary context switches: 1
	Involuntary context switches: 11
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
... % /usr/bin/time -v ./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report" --reporter 'console::out=-::colour-mode=ansi'
Randomness seeded to: 3630092844
===============================================================================
All tests passed (62914560 assertions in 1 test case)

	Command being timed: "./sample_test --reporter JUnit::out=/tmp/catch2-xml-report --reporter console::out=-::colour-mode=ansi"
	User time (seconds): 11.01
	System time (seconds): 7.73
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.86
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 29087972
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 565154
	Voluntary context switches: 1
	Involuntary context switches: 250
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Expected behavior
The test running with only minimal overhead. At least for the success case. In the end the report only contains more or less the same information as the command line (plus the name of each test & its runtime, but that shouldn't be too costly, right?). The file (in the success case) is just 448 characters.

Reproduction steps
I build a small sample project showing the problem. It uses conan to get Catch2 but I guess it could easily be included otherwise, conan is just the way I'm used to now: https://github.com/irieger/cpp-playground/blob/main/catch2-speed-sample/

The relevant file is catch2-speed-sample/src/catch2_many_assertions.cpp. It "simulates" a common use case - at least for me - where I have a large vector holding image data comparing an input processed through an algorithm with a reference, often with an accepted difference.

This results in quite some millions of assertions. In this case I simulate a 30 Megapixel RGB image. Normally I have much smaller tests but also quite a few of those resulting in some million assertions in sum too.

Running this either directly (or with manually set console output) vs. a JUnit reporter shows the difference. Here is the output of running 10 times for each showing that the timing is overall consistent.

Console only, very consistent ~1.2s overall time for execution per run. Output piped to /dev/null as there intentionally are some assertions failing here:

... % for i in $(seq 10); do time ./sample_test --reporter 'console::out=-::colour-mode=ansi' >/dev/null; done
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.15s user 0.06s system 99% cpu 1.212 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.05s system 99% cpu 1.200 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.13s user 0.06s system 99% cpu 1.201 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.17s user 0.04s system 99% cpu 1.212 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.06s system 99% cpu 1.204 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.18s user 0.03s system 99% cpu 1.215 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.14s user 0.06s system 99% cpu 1.203 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.15s user 0.05s system 99% cpu 1.199 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.18s user 0.04s system 99% cpu 1.218 total
./sample_test --reporter 'console::out=-::colour-mode=ansi' > /dev/null  1.13s user 0.08s system 99% cpu 1.215 total

Run with JUnit reporter, time per execution ~24.4s:

.. % for i in $(seq 10); do time ./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"; done             
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.38s user 12.52s system 99% cpu 24.019 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.60s user 13.07s system 99% cpu 24.794 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.90s user 12.85s system 99% cpu 24.884 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.78s user 13.06s system 99% cpu 24.956 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.71s user 12.19s system 99% cpu 24.020 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.48s user 12.88s system 99% cpu 24.484 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.82s user 12.96s system 99% cpu 24.910 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.56s user 12.91s system 99% cpu 24.585 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.73s user 12.99s system 99% cpu 24.851 total
./sample_test --reporter "JUnit::out=/tmp/catch2-xml-report"  11.44s user 12.63s system 99% cpu 24.199 total

If I change it to succeed, it takes ~1.1s with the console output & ~18s

Platform information:

OS: Arch Linux
Compiler+version: GCC v14.2.1
Catch version: v3.6.0

The measurements and system information above are from my workstation. In my CI I can reproduce the behaviour for Linux, Mac (for arm64 native and x64 cross-compiled run in Rosetta 2) and Windows, although the performance hit isn't exactly the same, in all cases the reporter is several times slower. In the end it is roughly 10-30 times slower.

Additional context
The reporter is a really nice addition and I was happy to find it. But blocking a CI runners for 240s instead of 10s is really a huge hit. Hope there is a way to prevent that.

Maybe an alternative if that is performance relevant could be to have a flag to only report the number of successes/faild in the XML, not report each with a line in the XML? Maybe this is the overhead?

The text was updated successfully, but these errors were encountered:

irieger · 2024-08-11T01:02:13Z

I'm currently editing as continuing to extend my tests just resulted in CI failures due to killed processes.

horenmar · 2024-08-13T10:26:34Z

Discounting any other allocations, that would come out to about 4.6kB per assertion, which is large, but not impossibly so, especially since the peak memory usage is going to be when the underlying storage vector is reallocated, at which point the memory usage is up to 3x the quiescent state (1x existing storage, 2x the new storage).

The issue with JUnit format is that it cannot be written as the results are processed, but has to be written after all the tests are ran... and to do that, it has to store all required metadata. This means storing the metadata of every assertion that happened -> looking at how the CumulativeReporterBase stores the data, it needs 480B per assertion if no further data are stored there.

Thankfully it already skips expanding the assertion message because JUnit reporter does not ask for message in passing assertions. After looking into it further, it might be possible to have JUnit skip storing passing assertion, as it currently does not do anything with them, even when the user specifies -s.

If someone feels like making that optimization they are free to. But for your personal use case, you should not be having that many assertions to compare two vectors; use matchers.

Given `TEST_CASE` that just runs many passing assertions, e.g. ``` TEST_CASE( "PERFORMANCE_TEST_CASE", "[.]" ) { for (size_t i = 0; i < 1000'000'000; ++i) { REQUIRE( i == i ); } } ``` the new JUnit reporter will finish in 5:47, using up ~7.7 MB of RAM. The old JUnit reporter would finish in 0:30, due to bad_alloc after using up 14.5 GB of RAM (the system has 16 GB total). If the total number of assertions is lowered to 10M, the old implementation uses up ~ 7.1 GB of RAM and finishes in 12 minutes. The new implementation still needs ~7.7 MB of RAM, and finishes in 4 minutes. There is a slight downside in that the output is slightly different; the new implementation will include the `TEST_CASE` level section in output, even if it does not have its own assertion. In other words, given a `TEST_CASE` like this ``` TEST_CASE( "JsonWriter", "[JSON][JsonWriter]" ) { std::stringstream stream; SECTION( "Newly constructed JsonWriter does nothing" ) { Catch::JsonValueWriter writer{ stream }; REQUIRE( stream.str() == "" ); } SECTION( "Calling writeObject will create an empty pair of braces" ) { { auto writer = Catch::JsonValueWriter{ stream }.writeObject(); } REQUIRE( stream.str() == "{\n}" ); } } ``` the new implementation will output 3 `testcase` tags, 2 for the explicit `SECTION`s with tests, and 1 for the top level section. However, this can be worked-around if required, and the performance improvement is such that it is worth changing the current output, even if it needs to be fixed in the future. Closes #2897

irieger · 2024-08-13T19:14:06Z

Hey, thanks for your reply and the general effort with Catch2. The change you merged fixes the problem.

Did some more digging and now understood how the matching of vectors works. Hadn't understood that in the first run. So it was faster with the patch but now I corrected the initial mistake too.

irieger changed the title ~~Huge performance hit when using the JUnit reporter~~ Huge performance hit & termination due to high memory usage when using the JUnit reporter Aug 11, 2024

horenmar closed this as completed in fe483c0 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

irieger commented Aug 10, 2024 •

edited

Loading

irieger commented Aug 11, 2024

horenmar commented Aug 13, 2024

irieger commented Aug 13, 2024

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

Huge performance hit & termination due to high memory usage when using the JUnit reporter #2897

Comments

irieger commented Aug 10, 2024 • edited Loading

irieger commented Aug 11, 2024

horenmar commented Aug 13, 2024

irieger commented Aug 13, 2024

irieger commented Aug 10, 2024 •

edited

Loading