-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: start using the newly added transport and channel functionalities #7773
xdsclient: start using the newly added transport and channel functionalities #7773
Conversation
xds/internal/xdsclient/authority.go
Outdated
closed bool | ||
resources map[xdsresource.Type]map[string]*resourceState | ||
|
||
// An ordered list of xdsChannels along with their server configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be mention the order criteria? or is it just order of addition? In that case, do we even need to mention the order as list is implicit to maintain the order in which items are added.
Also, may be An ordered list of xdsChannels along with their server configuration to which authority has subscribed to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the ordered list of xds channels corresponding to the ordered list of server configuration present in the authority config in the bootstrap.
The ordering here specifies the priority, i.e. the first entry is preferred over everything that comes after it, the second entry is preferred over everything that comes after it etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded the comment a little. Maybe it is a little more clear now. Thanks.
xds/internal/xdsclient/authority.go
Outdated
// The current active xdsChannel. Here, active does not mean that the | ||
// channel has a working connection to the server. It simply points to the | ||
// channel that we are trying to work with, based on fallback logic. | ||
activeChannel *xdsChannelWithConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: activeXdsChannel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// server config. The error will be forwarded to all the resource watchers. | ||
// | ||
// Errors of type xdsresource.ErrTypeStreamFailedAfterRecv are ignored. | ||
func (a *authority) adsStreamFailure(serverConfig *bootstrap.ServerConfig, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarification: may be need to mention who calls adsStreamFailure and adsResourceUpdate called by xdsclient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
for _, rType := range a.resources { | ||
for _, state := range rType { | ||
for watcher := range state.watchers { | ||
watcher := watcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why watcher := watcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the callback passed to the serializer access this loop variable. See https://fuchsoria.medium.com/shadowing-or-how-to-fix-loop-variable-i-captured-by-func-literal-f365c0ee984e
a.updateResourceStateAndScheduleCallbacks(rType, updates, md, onDone) | ||
return err | ||
|
||
// TODO(easwars-fallback): Trigger fallback here if conditions for fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should still report error if fallback is being triggered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no fallback support in this PR. It will be added in the next PR.
But what error do you want to report during fallback and to whom?
for watcher := range state.watchers { | ||
watcher := watcher | ||
watcherCnt.Add(1) | ||
funcsToSchedule = append(funcsToSchedule, func(context.Context) { watcher.OnResourceDoesNotExist(done) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these funcsToSchedule are scheduled on watcherCallbackSerializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the top of this function via a defer.
} | ||
cleanup = a.unwatchResource(rType, resourceName, watcher) | ||
}) | ||
<-done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to spawn a separate thread for watcher if we need to wait for completion anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we don't schedule it on the serializer, we would have to guard the fields with a mutex.
} | ||
a.transport.SendRequest(rType.TypeURL(), resourcesToRequest) | ||
a.xdsChannelConfigs[0].xc = xc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why always use first configuration? What are other configurations there for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use the first configuration prior to supporting fallback. The other configurations will be used when fallback support is added as part of the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Midway through first pass; super light so far but figured I'd send part way since I'll finish the pass tomorrow so you can have something to in parallel work on.
xds/internal/xdsclient/authority.go
Outdated
// the management server. If an active channel is available, it returns that. | ||
// Otherwise, it creates a new channel using the first server configuration in | ||
// the list of configurations, and returns that. | ||
func (a *authority) xdsChannelToUseLocked() *xdsChannelWithConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the onLocked method here and below; mention what mu needs to be held while calling these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no mutex in the authority
type any more. So, just added a note that this needs to be run in the context of a serializer callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Should we drop the "Locked" suffix then (similar to how handle functions don't have locked suffix)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xds/internal/xdsclient/authority.go
Outdated
watcherCallbackSerializer: args.serializer, | ||
getChannelForADS: args.getChannelForADS, | ||
serializer: grpcsync.NewCallbackSerializer(ctx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the xDS Client serializer created inline but the watcherCallback is a knob? I see the watcher callback serializer getting passed from newClient as the client serializer? Is that to make all watches across all authorities serial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the xdsClientSerializer is an implementation detail of the authority
struct, while the serializer used in the xDS client to serialize watch callbacks is something that is shared across all watchers. For example, a single watcher implementation could register watches on multiple authorities. In such a case, we still need to guarantee that callbacks across the watches are not invoked concurrently. So, we need a single serializer for all watchers.
xds/internal/xdsclient/client_new.go
Outdated
} | ||
|
||
for name, cfg := range config.Authorities() { | ||
// If server configus are specified in the authorities map, use that. Else, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: configus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xds/internal/xdsclient/client_new.go
Outdated
} | ||
|
||
for name, cfg := range config.Authorities() { | ||
// If server configus are specified in the authorities map, use that. Else, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my understanding, config.XDSServers()
will be there only in case of fallback servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
xds/internal/xdsclient/client_new.go
Outdated
logPrefix: clientPrefix(c), | ||
}) | ||
} | ||
c.topLevelAuthority = newAuthority(authorityArgs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my understanding, what is the difference between top level authority and authority map ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Top-level server configuration is used when an authority configuration does not contain server configuration.
authorities map[string]*authority // Map from authority names in bootstrap to authority struct. | ||
config *bootstrap.Config // Complete bootstrap configuration. | ||
watchExpiryTimeout time.Duration // Expiry timeout for ADS watch. | ||
backoff func(int) time.Duration // Backoff for ADS and LRS stream failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be uncommon to have different backoffs for ADS and LRS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know of a scenario where that would be useful.
xds/internal/xdsclient/clientimpl.go
Outdated
backoff func(int) time.Duration // Backoff for ADS and LRS stream failures. | ||
transportBuilder transport.Builder // Builder to create transports to the xDS server. | ||
resourceTypes *resourceTypeRegistry // Registry of resource types, for parsing incoming ADS responses. | ||
serializer *grpcsync.CallbackSerializer // Serializer for invoking watcher callbacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: `Serializer for invoking resource watcher callbacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOne.
return | ||
} | ||
|
||
cs.parent.channelsMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer cs.parent.channelsMu.Unlock()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return | ||
} | ||
|
||
cs.parent.channelsMu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer cs.parent.channelsMu.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -69,22 +145,57 @@ func (c *clientImpl) BootstrapConfig() *bootstrap.Config { | |||
return c.config | |||
} | |||
|
|||
// close closes the gRPC connection to the management server. | |||
// close closes the xDS client and releases all resources. | |||
func (c *clientImpl) close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i asked this above. Will user of the xds client call this or its internally called on some condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called when the reference count on the xDS client comes down to 0
. See client_refcounted.go for details.
c.logger.Infof("Reviving an xdsChannel from the idle cache for server config %q", serverConfig) | ||
} | ||
state, _ := state.(*channelState) | ||
c.xdsActiveChannels[serverConfig.String()] = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.xdsActiveChannels[serverConfig.String()] = state
. Shouldn't this need to be synchronized too like inside lock/unlock critical section? Actually may be the whole if block starting 266 needs to be in critical section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grab the lock at the top of this method. Are you saying that is not enough?
return nil, func() {}, fmt.Errorf("failed to create xdsChannel for server config %s: %v", serverConfig, err) | ||
} | ||
state.channel = channel | ||
c.xdsActiveChannels[serverConfig.String()] = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. why 304 and 305 don't need to be within critical section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grab the lock at the top of this method. Are you saying that is not enough?
// a channel to the first server configuration is created when the first watch | ||
// is registered, and more channels are created as needed by the fallback logic. | ||
func newAuthority(args authorityBuildOptions) *authority { | ||
ctx, cancel := context.WithCancel(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not applicable to this PR; but just to remind about context scoping discussion for the future when you refactor xDS Client @purnesh42H
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
@@ -1,295 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a pure refactor; can we not keep some of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are already moved out to e2e style tests in xds/internal/xdsclient/tests/authority_test.go
, xds/internal/xdsclient/tests/ads_stream_watch_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh those cover the same functionality? The diffs in those e2e tests seemed pretty minor so I thought we just deleted these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following tests old tests TestTimerAndWatchStateOnSendCallback
, TestTimerAndWatchStateOnErrorCallback
and TestWatchResourceTimerCanRestartOnIgnoredADSRecvError
are now covered in the new test TestADS_WatchState_StreamBreaks
And we also have another new test TestADS_WatchState_TimerFires
which explicitly tests timer firing scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#a71-xds-fallback
#xdsclient-refactor
Addresses #6902
This PR is the last of the refactor PRs. Contains the following:
In a follow-up PR, the TODOs mentioned for fallback will be filled in with e2e style tests.
RELEASE NOTES: none