Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alts: Add retry loop when making RPC in ALTS's TestFullHandshake. #6183

Merged
merged 5 commits into from
Apr 11, 2023

Conversation

matthewstevenson88
Copy link
Contributor

@matthewstevenson88 matthewstevenson88 commented Apr 10, 2023

This fixes the flaky test introduced in #6177 and flagged in #6182. Before this PR, I could reproduce the flake ~1.7% of time on local runs. With this PR, I cannot reproduce the flake locally.

Fixes #6182

RELEASE NOTES: none

@matthewstevenson88 matthewstevenson88 changed the title alts: Add grpc.WithBlock() to ALTS TestFullHandshake to prevent flaki… alts: Add grpc.WithBlock() dial option in ALTS's TestFullHandshake. Apr 10, 2023
@matthewstevenson88 matthewstevenson88 marked this pull request as ready for review April 10, 2023 22:35
@matthewstevenson88 matthewstevenson88 changed the title alts: Add grpc.WithBlock() dial option in ALTS's TestFullHandshake. alts: Add retry loop when making RPC in ALTS's TestFullHandshake. Apr 11, 2023
@matthewstevenson88
Copy link
Contributor Author

@easwars This should be ready for review when you have a chance. :)

The failing test seems to be an unrelated XDS flake.

@@ -337,8 +339,15 @@ func TestFullHandshake(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
c := testgrpc.NewTestServiceClient(conn)
if _, err = c.UnaryCall(ctx, &testpb.SimpleRequest{}, grpc.WaitForReady(true)); err != nil {
t.Errorf("c.UnaryCall() failed: %v", err)
for ; ctx.Err() == nil; <-time.After(5 * time.Second) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be more like this:

  • You have an overall deadline of 10s
  • You retry every 10ms
  • You make the RPCs without the WaitForReady call option
  • I'm not convinced that DeadlineExceeded is something where you retry. It should ideally be something else. You probably will see the correct code now if you make the RPC without the WaitForReady call option.
	const defaultTestTimeout = 10 * time.Second
	const defaultTestShortTimeout = 10 * time.Millisecond
	ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout)
	defer cancel()
	c := testgrpc.NewTestServiceClient(conn)
	// TODO: A comment explaining why we need to retry the RPC until it succeeds.
	for ; ctx.Err() == nil; <-time.After(defaultTestShortTimeout) {
		_, err = c.UnaryCall(ctx, &testpb.SimpleRequest{})
		if err == nil {
			break
		}
		if code := status.Code(err); code == <expected code when handshake fails> {
			continue
		}
		t.Fatalf("c.UnaryCall() failed with unexpected error: %v", err)
	}
	if ctx.Err() != nil {
		t.Fatalf("Timeout when waiting for UnaryCall RPC to succeed")
	}

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, the WaitForReady option was masking an Unavailable error code, which makes more sense. And thanks for the tips! Implemented as you suggested. :)

@easwars easwars merged commit 06de8f8 into grpc:master Apr 11, 2023
1 check passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestFullHandshake
2 participants