-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299
Conversation
f8ccb11
to
b90c11b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #6299 +/- ##
=======================================
Coverage 81.7% 81.7%
=======================================
Files 312 312
Lines 34613 34613
=======================================
+ Hits 28297 28300 +3
+ Misses 4479 4477 -2
+ Partials 1837 1836 -1 ☔ View full report in Codecov by Sentry. |
hare4/hare_test.go
Outdated
if !assert.Eventually(t, func() bool { | ||
cluster.nodes[other[0]].hare.mu.Lock() | ||
defer cluster.nodes[other[0]].hare.mu.Unlock() | ||
_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer] | ||
return registered | ||
}, 5*time.Second, 50*time.Millisecond) { | ||
panic(fmt.Sprintf("node %d did not register in time", other[0])) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
if !assert.Eventually(t, func() bool { | |
cluster.nodes[other[0]].hare.mu.Lock() | |
defer cluster.nodes[other[0]].hare.mu.Unlock() | |
_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer] | |
return registered | |
}, 5*time.Second, 50*time.Millisecond) { | |
panic(fmt.Sprintf("node %d did not register in time", other[0])) | |
} | |
require.Eventuallyf(t, func() bool { | |
cluster.nodes[other[0]].hare.mu.Lock() | |
defer cluster.nodes[other[0]].hare.mu.Unlock() | |
_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer] | |
return registered | |
}, 5*time.Second, 50*time.Millisecond, node %d did not register in time", other[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the assert
& panic
is that if the assertion fails require
calls t.Fail
which marks the test as failing and stops the current go routine. This call to Publish
however is not on the main go routine which is now stuck waiting for it to return.
Normally this can be implemented cleanly with gomock.WithContext
and passing that context to the code that calls the mocks. I tried updating the code to make that work, but it turned out to be too big of a refactoring to fix a single flaky test.
Instead I cleaned up the code a bit and in the case a require
fails we just rely on the calling side to time out correctly.
bors merge |
## Motivation Fix flaky test. Closes #6293.
bors merge |
## Motivation Fix flaky test. Closes #6293.
Build failed:
|
bors merge |
## Motivation Fix flaky test. Closes #6293.
Build failed (retrying...): |
## Motivation Fix flaky test. Closes #6293.
Pull request successfully merged into develop. Build succeeded: |
Motivation
Fix flaky test.
Closes #6293.
Description
The reason the test is flaky is because occasionally the first node sends a hare message before the other nodes have started processing the layer yet. I updated the code to check if the node that "receives" the message has already updated its state to start processing the layer of interest.
Test Plan
Before the change this would always result in a fail on my machine:
After the change the test seems to be more reliable.
TODO