Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR includes the changes for the SDS update failure
Problem
We started using Delta ADS for LDS, CDS, RDS and SDS resources. We have three instances of envoy proxy client connected to the management server. During testing, we found that some of the envoy proxy's were not receiving the SDS updates.
Root cause analysis
We investigated the issue by adding logs to the go-control-plane and figured out that control plane is managing the list of the subscribed resources for each client and before sending the updates it compares the hash of the resource set in the snapshot with the hash of the same resource in the subscription list. If the specified resource is not specified in the client's subscription list, then updates are not sent to the respective client. And we found that somehow the entry from the subscription list was getting removed and as a result the updates were not getting pushed to the envoy proxy.
Why resources were getting removed from the subscription list ?
processDelta handler responsible for handling the client requests and responses updates the "watch.state.resourceVersions" at two places,
[1] As part of the subscription request from the envoy "s.subscribe(req.GetResourceNamesSubscribe(), watch.state.GetResourceVersions())"
[2] After sending the response to the envoy proxy "watch.state.SetResourceVersions(resp.GetNextVersionMap())"
Here, the problem is in the update of the resourceversion map after sending the response. After sending the response, we are directly replacing the resourceversion map with the map specified in the response. This is problem for the non wildcard resource type.
Please find below the detailed use-case to understand why replacing the resourceversion map inside the response handling is problematic.
Envoy requests for the resource (resource-1)
In the request handler, stream state resourceversion map is updated to [resource-1: empty hash] & response-1 is created with newresouceversion map [resource-1: hash]
Before the response is sent to the envoy, envoy sends the new request for the resource (resource-2). In the request handler, stream state resourceversion map is updated to [resource-1: empty hash, resource-2 empty hash] and response-2 is created with newresouceversion map [resource-1: hash, resource-2: hash].
Response-1 is sent to the envoy proxy and stream state version map is updated with the newresouceversion map [resource-1: hash]. After sending the response-1, the stream state resourceversion map will be updated to [resource-1: hash]
Envoy sends another request for the resource-3. In the request handler, stream state resourceversion map is updated to [resource-1: hash, resource-3 empty hash] and response-3 is created with newresouceversion map [resource-1: hash, resource-3: hash].
Response-2 is sent to the envoy proxy and stream state version map is updated with the newresouceversion map [resource-1: hash, resource-2: hash]. After sending the response-2, the stream state resourceversion map will be [resource-1: hash, resource-2: hash]
Response-3 is sent to the envoy proxy and stream state version map is updated with the newresouceversion map [resource-1: hash, resource-3: hash]. After sending the response-3, the stream state resourceversion map will be [resource-1: hash, resource-3: hash]
Final resourceversion map will be [resource-1: hash, resource-3: hash]. This is incorrect as we lost the subscription to the resource-2. Hence, any config updates to the resource-2 will not be sent to the envoy proxy.