Backport of proxycfg: ensure that an irrecoverable error in proxycfg closes the xds session and triggers a replacement proxycfg watcher into release/1.15.x #16529
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #16497 to be assessed for backporting due to the inclusion of the label backport/1.15.
The below text is copied from the body of the original PR.
Description
Receiving an "acl not found" error from an RPC in the agent cache and the streaming/event components will cause any request loops to cease under the assumption that they will never work again if the token was destroyed. This prevents log spam (#14144, #9738).
Unfortunately due to things like:
The caller will get an "acl not found" before the token exists, rather than just after. The machinery added above in the linked PRs will kick in and prevent the request loop from looping around again once the tokens actually exist.
For
consul-dataplane
usages, where xDS is served by the Consul servers rather than the clients ultimately this is not a problem because in that scenario theagent/proxycfg
machinery is on-demand and launched by a new xDS stream needing data for a specific service in the catalog. If the watching goroutines are terminated it ripples down and terminates the xDS stream, which CDP will eventually re-establish and restart everything.For Consul client usages, the
agent/proxycfg
machinery is ahead-of-time launched at service registration time (called "local" in some of the proxycfg machinery) so when the xDS stream comes in the data is already ready to go. If the watching goroutines terminate it should terminate the xDS stream, but there's no mechanism to re-spawn the watching goroutines. If the xDS stream reconnects it will see noConfigSnapshot
and will not get one again until the client agent is restarted, or the service is re-registered with something changed in it.This PR fixes a few things in the machinery:
Overview of commits