-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes [SUREFIRE-1516]: Poor performance in reuseForks=false #253
Conversation
… from the host JVM after it sends BYE_ACK. Threads blocking on `read` may not be interruptable until they poll for interrupts (every 350msec), which can introduce significant latency for short-lived processes.
The build fails on macOS. |
@@ -412,6 +412,10 @@ public void run() | |||
CommandReader.this.wakeupIterator(); | |||
callListeners( command ); | |||
break; | |||
case BYE_ACK: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very elegant solution.
In the ForkedBooter
the listener is already registered commandReader.addByeAckListener( new CommandListener()
, see the line 356. See the next comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I do not understand is why this change cause IT failure on macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to reproduce it with several runs of the build.
surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java
Show resolved
Hide resolved
It looks like several of the tests are flaky, perhaps - when I try to reproduce locally on MacOS, I get a different set of tests failing. Unfortunately I do not have time to debug these tests this week. |
@Tibor17 can you trigger the Mac OS build to run again? I just tried to run the test suite locally again on my Mac and now none of the integration tests failed. I think that there is definitely a flaky test here (and the failure is not due to my change) |
@Tibor17 please note that all of the tests are now passing. |
@jon-bell I have triggered the build again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice trick.
Thank you for providing this enhancement
I hope we can commit this patch soon, I left my comment
surefire-api/src/main/java/org/apache/maven/surefire/booter/CommandReader.java
Show resolved
Hide resolved
The build failed again https://github.com/apache/maven-surefire/runs/352389395 |
@jon-bell |
b4c509d
to
1364c62
Compare
@jon-bell Thx for contributing. I used your changes, updated the IT 705 and closed the Jira as fixed. |
Hi,
This PR resolves the performance bug noted in SUREFIRE-1516, which appears when using the
reuseForks=false
configuration. The root-cause of the observed performance problem comes from forked JVM teardown time.The issue is that the forked JVM should not block reading IO to read more from the host JVM after it sends BYE_ACK. Threads blocking on
read
may not be interruptable until they poll for interrupts (every 350msec for stdin), which can introduce significant latency for short-lived processes, such as surefire forked JVMs which are running just a single test at a time. This 350msec overhead can add up on projects that have thousands of test classes, each of which might take only several hundred msec to run.To measure the scope of the problem and confirm its resolution, I created a simple benchmark test suite, which consists of 100 JUnit test classes, each with a single test that calls
Thread.sleep(250)
. I instrumented the JVM to record the time that each JVM starts, the time that the test starts (as measured by JUnit), the time that the test ends (as measured by JUnit), and the time that the JVM terminates. For comparison, I also did the same experiment with ant and gradle.The table below shows the results, which represent the average time for each test (over the 100 samples):
Overall, most build systems took similar amounts of time to spin up the JVM, and all took the expected 250ms to run each test. You can see that the current
master
version of surefire (b97df5a) takes an unusually high amount of time to tear down the forked JVM (in fact, 350 msec more, which is exactly the time for the JVM to interrupt a thread reading fromstdin
explained in this fantastic Stack Overflow post). This is an easy fix though: after receiving theBYE_ACK
message, the forked JVM can stop reading commands from the main surefire process, since it's shutting down. After implementing this fix, the overhead goes away (as shown in 2fbe44f).