Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: support fallback within each xDS server authority [A71] #6902

Closed
5 tasks
easwars opened this issue Dec 28, 2023 · 0 comments · Fixed by #7701
Closed
5 tasks

xdsclient: support fallback within each xDS server authority [A71] #6902

easwars opened this issue Dec 28, 2023 · 0 comments · Fixed by #7701
Assignees
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. P2 Type: Feature New features or improvements in behavior

Comments

@easwars
Copy link
Contributor

easwars commented Dec 28, 2023

To support xDS client fallback, we need a bunch of changes in the xdsclient.authority:

  • xdsclient.authority needs to support an ordered list of transports
    • Instead of supporting a single transport, an ordered slice of transports will be maintained
    • When transport N is active, all transports N+1 and greater will be shut down
    • When transport N goes down, transport N+1 will be activated
  • Communicate the connectivity state from the xdsclient/transport.Transport to the xdsclient.authority
    • We recently added functionality to support a generic pub-sub mechanism here This is used by the client channel to report connectivity state changes via an internal-only API here.
    • We could switch the xdsclient/transport.Transport to use this API and report connectivity state to the xdsclient.authority, or we could rely on the ADS stream being closed before receiving the first response from the server as being a signal for connectivity failure, and the subsequent successful receipt of a message from the server as a signal for connectivity success.
    • Either way, the xdsclient/transport.Transport needs to make this available to the xdsclient.authority
  • xdsclient.authority already maintains the state of all registered watches. So, we have the required data to determine if at least one watcher exists for a resource that is not cached.
  • xdsclient.authority will have a long-running goroutine that uses connectivity state of the transports (as mentioned in bullet item 2) and the state of the registered watches (as mentioned in bullet item 3) to initiate fallback to a lower priority server when the higher priority server goes down, and initiate reverting to a higher priority server (and shutting down lower priority servers) when it comes back up
  • When a switch from one transport to the other happens, the LRS stream needs to be started on the new one, if any user of the xDS client had initiated one earlier

Other changes that could improve the state of things here:

  • Currently, xdsclient.authority does not pass a context to its transport. The latter creates a context from the background context to use for the ADS stream. The transport is closed by specifically invoking a Close method.
    • With the need to support multiple transports within the same authority, we could have the xdsclient.authority pass a context to every transport that it creates. This will help with closing all transports, when the authority is being shut down.
    • But, we still need an explicit Close method on the transport because we need the ability to shut down individual transports when a higher priority server comes back up.
@easwars easwars added the Type: Feature New features or improvements in behavior label Dec 28, 2023
@dfawley dfawley added the P2 label Jan 10, 2024
@easwars easwars self-assigned this Aug 5, 2024
@arjan-bal arjan-bal added the Area: xDS Includes everything xDS related, including LB policies used with xDS. label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. P2 Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants