-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: invoke watch callback when new update matches cached one, but previous update was NACKed (1/N) #7692
Conversation
…but previous update was NACKed
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7692 +/- ##
==========================================
- Coverage 81.93% 81.84% -0.09%
==========================================
Files 361 361
Lines 27822 27822
==========================================
- Hits 22797 22772 -25
- Misses 3838 3853 +15
- Partials 1187 1197 +10
|
2cf64a5
to
e41e411
Compare
e41e411
to
a41f75b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments but overall LGTM.
// 2. When a subsequent bad response is received, i.e. once is expected to be | ||
// NACKed, the test verifies that a NACK is sent matching the previously | ||
// ACKed version and current nonce from the response. | ||
// 3. When a subsequent goos response is received, the test verifies that an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol. When I first read this comment from you, I thought you were appreciating me for something :).
Fixed.
ResponseNonce: "", | ||
} | ||
if diff := cmp.Diff(gotReq, wantReq, protocmp.Transform()); diff != "" { | ||
t.Fatalf("Unexpected diff in received discovery request, diff (-got, +want):\n%s", diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm you like new line now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I comment any time in the past against newlines?
I like it here because the diff can be large. So, it makes sense to be on a separate line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just never seen the diff outputted on a new line. I wasn't saying it's wrong, just interesting. Maybe I'll do this for large diffs too.
} | ||
gotResp = r.(*v3discoverypb.DiscoveryResponse) | ||
|
||
var wantNackErr = errors.New("unexpected http connection manager resource type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wantNackErr := is better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Not sure why I had it the other way around.
Just FYI. There is a related topic (not exact though) in non-decisions: https://google.github.io/styleguide/go/decisions#non-decisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.
// We expect the version to not change as this is a NACK. | ||
r, err = streamRequestCh.Receive(ctx) | ||
if err != nil { | ||
t.Fatal("Timeout when waiting for ACK") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NACK. But it's not really specific, it's just a timeout waiting for a request on stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACKs and NACKs are similar in a lot of ways to actual requests. They use the same message type and the same fields.
And the channel that we use here streamRequestCh
is used to for all of them.
for { | ||
r, err = streamResponseCh.Receive(ctx) | ||
if err != nil { | ||
t.Fatal("Timeout when waiting for the discovery response from client") | ||
} | ||
gotResp = r.(*v3discoverypb.DiscoveryResponse) | ||
|
||
// Verify that the ACK contains the appropriate version and nonce. | ||
r, err = streamRequestCh.Receive(ctx) | ||
if err != nil { | ||
t.Fatal("Timeout when waiting for ACK") | ||
} | ||
gotReq = r.(*v3discoverypb.DiscoveryRequest) | ||
wantReq.VersionInfo = gotResp.GetVersionInfo() | ||
wantReq.ResponseNonce = gotResp.GetNonce() | ||
wantReq.ErrorDetail = nil | ||
diff := cmp.Diff(gotReq, wantReq, protocmp.Transform()) | ||
if diff == "" { | ||
break | ||
} | ||
t.Logf("Unexpected diff in received discovery request, diff (-got, +want):\n%s", diff) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this polling codeblock also exit with an error of never got expected discovery request if times out? I don't think it's right to fail because no discovery response or waiting for ack, as that is part of the polling operation under test but not the eventually consistent thing that is expected to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test timeout expires, the first line in the for
loop will see an error and the test will fail with Timeout when waiting for the discovery response from client
. Do you want more here? If so, can you please be specific on what you want me to do? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is related to the comment on the other PR. I didn't like the fact that failures occured within the operations of the polling loop, where the polling loop is testing another layer/behavior that is expected to be eventually consistent.
// Verify the update received by the watcher. | ||
for ; ctx.Err() == nil; <-time.After(100 * time.Millisecond) { | ||
if err := verifyListenerUpdate(ctx, lw.updateCh, wantUpdate); err != nil { | ||
t.Logf("Failed to verify listener update, err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: You fail earlier with just the err. Prepend the same text? Or remove this text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that this is a t.Logf
and not a t.Errorf
or t.Fatalf
is because it will take some time for the watcher callback to be invoked. So, I want to log failures up until the expected listener update is received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I thought about this a little more and changed it to not log on every failure, but only to log the error when the test actually fails to receive the expected listener update and times out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the actual text in the error message (so fail with "something" + err vs. err).
|
||
// Tests the case where the first response is invalid. The test verifies that | ||
// the NACK contains an empty version string. | ||
func (s) TestADS_ACK_NACK_InvalidFirstResponse(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test called ACK_NACK when it only NACK's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point. The only reason I have it the way I have it is to be able to do a go test <package name> -run Test/ADS_ACK_NACK
and run all of these tests.
If you really want me to change it, do let me know and I will. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I prefer it, but it's already merged so no worries haha.
t.Fatal("Timeout when waiting for the initial discovery request") | ||
} | ||
gotReq := r.(*v3discoverypb.DiscoveryRequest) | ||
wantReq := &v3discoverypb.DiscoveryRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the mentioned issue of "not acking the good update if last update was bad" exist for all resource types? Is it enough to just verify it for a listener resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not acking the good update if last update was bad
Where do we have this behavior? That doesn't sound right to me.
But anyways, we re-architected the xDS client transport to be agnostic of the resource type. So, there should not be any code in the xDS client transport layer that does different things based on the resource type. Any resource type specific behavior is mostly limited to parsing/decoding the resource and that is implemented in the xdsresource
package which we sometimes refer to as the data-model layer. Therefore verifying ACK/NACK sematics behavior for one resource type is good enough.
} | ||
gotReq = r.(*v3discoverypb.DiscoveryRequest) | ||
if gotNonce, wantNonce := gotReq.GetResponseNonce(), gotResp.GetNonce(); gotNonce != wantNonce { | ||
t.Errorf("Unexpected nonce in discovery request, got: %v, want: %v", gotNonce, wantNonce) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there any convention we follow on when to use t.Errorf or t.Fatalf? If not, why not always use t.Fatalf to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://google.github.io/styleguide/go/decisions#keep-going
Here we want the error message to also contain any diff in the error detail.
|
||
// Verify the update received by the watcher. | ||
var lastErr error | ||
for ; ctx.Err() == nil; <-time.After(100 * time.Millisecond) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using <-time.After
instead of select with context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time.After
here is part of the for
loop (to ensure that we pause a little in between iterations). Basically this loop could be written as follows as well:
for {
err := verifyListenerUpdate(ctx, lw.updateCh, wantUpdate)
if err == nil {
break
}
lastErr = err
select {
case <-time.After(100 * time.Millisecond):
continue
case <-ctx.Done:
t.Fatalf("Timeout when waiting for listener update. Last seen error: %v", lastErr)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Consider the following sequence of events:
This PR fixes a bug which was causing the xDS client to not invoke the watch callback at the last step of the above sequence of events.
This PR also contains a couple of other changes:
This PR is the first in a sequence of PRs that refactor the xDS client codebase to support fallback.
#a71-xds-fallback
#xdsclient-refactor
Addresses #6902
RELEASE NOTES: