[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299

fasmat · 2024-08-28T17:06:59Z

Motivation

Fix flaky test.

Closes #6293.

Description

The reason the test is flaky is because occasionally the first node sends a hare message before the other nodes have started processing the layer yet. I updated the code to check if the node that "receives" the message has already updated its state to start processing the layer of interest.

Test Plan

Before the change this would always result in a fail on my machine:

go test -race -count=100 -run ^TestHare_ReconstructForward$ github.com/spacemeshos/go-spacemesh/hare4

After the change the test seems to be more reliable.

TODO

Explain motivation or link existing issue(s)
Test changes and document test plan
Update documentation as needed
Update changelog as needed

codecov · 2024-08-28T17:45:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.7%. Comparing base (5b44c29) to head (a131cb3).
Report is 2 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff           @@
##           develop   #6299   +/-   ##
=======================================
  Coverage     81.7%   81.7%           
=======================================
  Files          312     312           
  Lines        34613   34613           
=======================================
+ Hits         28297   28300    +3     
+ Misses        4479    4477    -2     
+ Partials      1837    1836    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hare3/hare_test.go

poszu · 2024-08-29T07:00:07Z

hare4/hare_test.go

+					if !assert.Eventually(t, func() bool {
+						cluster.nodes[other[0]].hare.mu.Lock()
+						defer cluster.nodes[other[0]].hare.mu.Unlock()
+						_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer]
+						return registered
+					}, 5*time.Second, 50*time.Millisecond) {
+						panic(fmt.Sprintf("node %d did not register in time", other[0]))
+					}


How about

Suggested change

if !assert.Eventually(t, func() bool {

cluster.nodes[other[0]].hare.mu.Lock()

defer cluster.nodes[other[0]].hare.mu.Unlock()

_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer]

return registered

}, 5*time.Second, 50*time.Millisecond) {

panic(fmt.Sprintf("node %d did not register in time", other[0]))

}

require.Eventuallyf(t, func() bool {

cluster.nodes[other[0]].hare.mu.Lock()

defer cluster.nodes[other[0]].hare.mu.Unlock()

_, registered := cluster.nodes[other[0]].hare.sessions[m.Layer]

return registered

}, 5*time.Second, 50*time.Millisecond, node %d did not register in time", other[0])

The reason for the assert & panic is that if the assertion fails require calls t.Fail which marks the test as failing and stops the current go routine. This call to Publish however is not on the main go routine which is now stuck waiting for it to return.

Normally this can be implemented cleanly with gomock.WithContext and passing that context to the code that calls the mocks. I tried updating the code to make that work, but it turned out to be too big of a refactoring to fix a single flaky test.

Instead I cleaned up the code a bit and in the case a require fails we just rely on the calling side to time out correctly.

hare4/hare_test.go

fasmat · 2024-08-29T09:07:30Z

bors merge

## Motivation Fix flaky test. Closes #6293.

fasmat · 2024-08-29T09:21:09Z

bors merge

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors · 2024-08-29T11:17:05Z

Build failed:

systest-status

fasmat · 2024-08-29T12:16:52Z

bors merge

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors · 2024-08-29T12:56:42Z

Build failed (retrying...):

systest-status

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors · 2024-08-29T13:44:20Z

Pull request successfully merged into develop.

Build succeeded:

fasmat self-assigned this Aug 28, 2024

fasmat requested review from dshulyak, poszu, ivan4th and acud as code owners August 28, 2024 17:07

fasmat force-pushed the fix-6293 branch 2 times, most recently from f8ccb11 to b90c11b Compare August 28, 2024 17:24

poszu approved these changes Aug 29, 2024

View reviewed changes

spacemesh-bors bot pushed a commit that referenced this pull request Aug 29, 2024

Fix flaky TestHare_ReconstructForward (#6299)

e83202a

## Motivation Fix flaky test. Closes #6293.

fasmat added 3 commits August 29, 2024 11:20

Fix flaky TestHare_ReconstructForward

f928286

Cleanup

a591bd2

Review feedback

a131cb3

fasmat force-pushed the fix-6293 branch from 9a69c6f to a131cb3 Compare August 29, 2024 09:20

spacemesh-bors bot pushed a commit that referenced this pull request Aug 29, 2024

Fix flaky TestHare_ReconstructForward (#6299)

64f1e2f

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors bot pushed a commit that referenced this pull request Aug 29, 2024

Fix flaky TestHare_ReconstructForward (#6299)

a4f3e85

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors bot pushed a commit that referenced this pull request Aug 29, 2024

Fix flaky TestHare_ReconstructForward (#6299)

6f05844

## Motivation Fix flaky test. Closes #6293.

spacemesh-bors bot changed the title ~~Fix flaky TestHare_ReconstructForward~~ [Merged by Bors] - Fix flaky TestHare_ReconstructForward Aug 29, 2024

spacemesh-bors bot closed this Aug 29, 2024

spacemesh-bors bot deleted the fix-6293 branch August 29, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299

[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299

fasmat commented Aug 28, 2024

codecov bot commented Aug 28, 2024 •

edited

Loading

poszu Aug 29, 2024

fasmat Aug 29, 2024

fasmat commented Aug 29, 2024

fasmat commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

fasmat commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299

[Merged by Bors] - Fix flaky TestHare_ReconstructForward #6299

Conversation

fasmat commented Aug 28, 2024

Motivation

Description

Test Plan

TODO

codecov bot commented Aug 28, 2024 • edited Loading

Codecov Report

poszu Aug 29, 2024

Choose a reason for hiding this comment

fasmat Aug 29, 2024

Choose a reason for hiding this comment

fasmat commented Aug 29, 2024

fasmat commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

fasmat commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

spacemesh-bors bot commented Aug 29, 2024

codecov bot commented Aug 28, 2024 •

edited

Loading