-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve a race condition in FlowDurabilityTest causing test flakes #568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seems clear to me, let's fix other flaky tests in this class (I assume the fix would be similar).
I also see the PR was not build because of test failure, what are your thoughts is it other flake? If yes probably good idea would be to fix it also to avoid multiple releases?
@Pldi23 it's a different test class:
Probably an unstable connection issue. I could take a look into the logs and see if I can hunt that down |
also see #570 (comment) |
src/test/java/org/jenkinsci/plugins/workflow/cps/FlowDurabilityTest.java
Outdated
Show resolved
Hide resolved
@jglick yes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK—again, I am not deeply familiar with this. @dwnusbaum?
src/test/java/org/jenkinsci/plugins/workflow/cps/FlowDurabilityTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jenkinsci/plugins/workflow/cps/FlowDurabilityTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks generally fine to me, but I would make the new method private
like Jesse mentioned.
private static void verifyDirtyResumed(JenkinsRule rule, WorkflowRun run, String logStart) throws Exception { | ||
assertHasTimingAction(run.getExecution()); | ||
rule.waitForCompletion(run); | ||
Assert.assertEquals(Result.SUCCESS, run.getResult()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW there is JenkinsRule.assertBuildStatus
for this purpose, though it hardly matters.
A race condition in
FlowDurabilityTest.testResumeBlockedAddedAfterRunStart
caused the test to flake.The last "sleep" node was not loaded on restart.
Even though all individual nodes were persisted properly, a change to the head node ID was being done by another thread and was not completing before the original thread went on to restart Jenkins.
CpsThreadGroup.notifyNewHead
starts a thread every time it needs to write a new head to the build.xml file. (in line 499)Timeline of the failure:
notifyNewHead
is called with node with id 5 which is the sleep step node - thread updating the head startsnotifyNewHead
reaches the writing stage - too late, snapshot has been written with the old head!The solution is to wait for CpsFlowExecution to go into SUSPENDED state as it finishes persisting build.xml before going on with the restart.
The same flake seems to be happening in two more tests - likely same solution applies.