Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc: Fix flaky picker_wrapper tests #7560

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

arjan-bal
Copy link
Contributor

Fixes: #7521

Background

The tests call picker_wrapper.pick() in 5 go routines. The go routines are blocked on either the context getting cancelled or the picker being updated.

grpc-go/picker_wrapper.go

Lines 120 to 134 in cfd14ba

select {
case <-ctx.Done():
var errStr string
if lastPickErr != nil {
errStr = "latest balancer error: " + lastPickErr.Error()
} else {
errStr = fmt.Sprintf("received context error while waiting for new LB policy update: %s", ctx.Err().Error())
}
switch ctx.Err() {
case context.DeadlineExceeded:
return nil, balancer.PickResult{}, status.Error(codes.DeadlineExceeded, errStr)
case context.Canceled:
return nil, balancer.PickResult{}, status.Error(codes.Canceled, errStr)
}
case <-ch:

The context is cancelled through a deferred statement when the test function exits. The test function updates the picker in the last line before exiting:

bp.updatePicker(&testingPicker{sc: testSC, maxCalled: goroutineCount})

When the picker go routines see the context getting cancelled before the picker update, they log an error causing the test to fail.

Repro

Add a time.Sleep(100 * time.Milliseconds) just before waiting on the picker update / context cancellation here:

grpc-go/picker_wrapper.go

Lines 116 to 120 in cfd14ba

if ch == pg.blockingCh {
// This could happen when either:
// - pw.picker is nil (the previous if condition), or
// - we have already called pick on the current picker.
select {

This will cause the test to fail with the following log picker_wrapper_test.go:153: bp.pick returned non-nil error: rpc error: code = Canceled desc = received context error while waiting for new LB policy update: context canceled

Fix

Wait for all the go routines to finish before returning from the test function and cancelling the context.

RELEASE NOTES: None

@arjan-bal arjan-bal added this to the 1.67 Release milestone Aug 25, 2024
Copy link

codecov bot commented Aug 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.70%. Comparing base (b45fc41) to head (5fa62ae).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7560      +/-   ##
==========================================
- Coverage   82.07%   81.70%   -0.38%     
==========================================
  Files         360      361       +1     
  Lines       27533    27816     +283     
==========================================
+ Hits        22599    22726     +127     
- Misses       3759     3876     +117     
- Partials     1175     1214      +39     

see 51 files with indirect coverage changes

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. LGTM modulo the minor nit.

@@ -80,19 +81,24 @@ func (s) TestBlockingPick(t *testing.T) {
var finishedCount uint64
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
defer cancel()
wg := sync.WaitGroup{}
wg.Add(goroutineCount)
for i := goroutineCount; i > 0; i-- {
go func() {
if tr, _, err := bp.pick(ctx, true, balancer.PickInfo{}); err != nil || tr != testT {
t.Errorf("bp.pick returned non-nil error: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: While you are here, could you please improve these error messages to say something like:

t.Errorf("bp.pick returned transport: %v, error: %v, want transport: %v, error: nil", tr, err, testT)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the error messages for the 4 tests.

@easwars easwars assigned arjan-bal and unassigned easwars Aug 26, 2024
@arjan-bal arjan-bal merged commit c535946 into grpc:master Aug 26, 2024
14 checks passed
@arjan-bal arjan-bal deleted the fix-flaky-picker-test branch August 26, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flake: Test/BlockingPickSCNotReady
2 participants