Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: switch EDS watch to new generic xdsClient API #6414

Merged
merged 1 commit into from
Jun 27, 2023

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Jun 26, 2023

Summary of changes:

  • Switch the clusterresolver LB policy to use the resource agnostic WatchResource API instead of WatchEndpoints
    • Since clusterresolver tests were recently fixed to not rely on internal details, the tests needed no change for this PR
  • Remove all references to WatchEndpoints in the xdsClient codebase
    • Remove the method from the XDSClient interface
    • Update EDS tests to call the WatchResource API instead

#resource-agnostic-xdsclient-api

RELEASE NOTES: none

@easwars easwars requested a review from zasweq June 26, 2023 23:25
@easwars easwars added the Type: Internal Cleanup Refactors, etc label Jun 26, 2023
@easwars easwars added this to the 1.57 Release milestone Jun 26, 2023
Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

if err != nil {
er.topLevelResolver.onError(err)
// OnUpdate is invoked to report an update for the resource being watched.
func (er *edsDiscoveryMechanism) OnUpdate(update *xdsresource.EndpointsResourceData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: Why is this type "edsDiscoveryMechanism" when the function to create it is called newEDSResolver, and it's abbreviated as er in it's function argument. Should we just call it edsResolver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is definitely some scope for cleanup here (and your points all related to exiting code). But I don't want to do that in this PR, because there is also a related dnsDiscoveryMechanism similar to the edsDiscoveryMechanism and they both implement the endpointsResolver interface. If I get started on this cleanup, it will make the change much larger and will deviate from what I'm trying to do as part of this PR. If you think this cleanup will help, we can file an issue and track it separately. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I guess it makes sense to call it xxxDiscoveryMechanism actually since DiscoveryMechanism from above is what determines what each priority needs to do to get cluster configuration.

}

func (ew *endpointsWatcher) OnResourceDoesNotExist() {
ew.updateCh.SendOrFail(endpointsUpdateErrTuple{err: xdsresource.NewErrorf(xdsresource.ErrorTypeResourceNotFound, "Endpoints not found in received response")})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: resource of type Endpoints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error string corresponds to the one used in xds/internal/xdsclient/tests/resource_update_test.go. Earlier, since the code was calling the now deleted xdsclient.WatchEndpoints method implemented in xds/internal/xdsclient/clientimpl_watchers.go, and where we had a endpointsWatcher implementation that returned the following error upon resource-not-found.

func (c *endpointsWatcher) OnResourceDoesNotExist() {
	err := xdsresource.NewErrorf(xdsresource.ErrorTypeResourceNotFound, "resource name %q of type Endpoints not found in received response", c.resourceName)
	c.cb(xdsresource.EndpointsUpdate{}, err)
}

Now, if I change the error string here, I would have to touch the tests as well, which I feel is unnecessary. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the thing you refactored from going through: https://github.com/grpc/grpc-go/pull/6414/files#diff-23960407248b82145ea16f839405e2a47798a0de947a1f3c72aacbc36769fa13L132. And since the xDS Client would call this function in the case where a resource doesn't exist, I just wanted a string saying "a Resource of type Endpoints" rather than just "Endpoints", because it's something that isn't present that is a type, corresponding to what you changed from master. I don't feel too strongly about this though. I was just nitpicking about "Endpoints not found in received response".

Comment on lines +73 to +77
ew.updateCh.SendOrFail(endpointsUpdateErrTuple{err: err})
}

func (ew *endpointsWatcher) OnResourceDoesNotExist() {
ew.updateCh.SendOrFail(endpointsUpdateErrTuple{err: xdsresource.NewErrorf(xdsresource.ErrorTypeResourceNotFound, "Endpoints not found in received response")})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code you refactored into this, any send of this tuple type is a blocking send, whether it got an Update or it got an Error. Is there a reason you switched this to a non blocking, and if none, can you make this Send blocking for verification purposes? Seems less racy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier code was not using the go-control-plane, and the new tests use it. The go-control-plane server has this weird behavior where it keeps resending a NACKed response. This means that in cases where we deal with an error, we will keep hitting the same error again and again in our e2e tests. If I don't have this non blocking send, the test would leak this goroutine.

I remember filing an issue with the go-control-plane about this, but there was no response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, sounds good.

@zasweq zasweq assigned easwars and unassigned zasweq Jun 27, 2023
@easwars easwars merged commit 7eb5727 into grpc:master Jun 27, 2023
1 check passed
@easwars easwars deleted the eds_watch_cleanup branch June 27, 2023 20:38
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants