Enable ordered responses for ADS delta watches #752

zhaohuabing · 2023-08-14T10:42:24Z

Why we need this?

go-control-plane doesn't guarantee the orders of responses for delta ADS. If there are two requests, the response to the second request may be sent before the first request. This causes problems for us since our application generates xDS resources for clusters and listeners in sequence. Sometimes, the the listener resources are sent to Envoy before the cluster resources. As a result, Envoy complains that it can't find referenced clusters in the Listener configuration and fail to apply the configuration.

How this PR fix this issue?

This PR follow an approach similar to #544 to enable ordered responses for ADS delta watches.

Sort the watches to ensure that the responses are handled in the correct order(Endpoint, Cluster, Route, Listener ...)
Send all the responses to the shared deltaMuxedResponses channel to guarantee that the orders won't be changed
Since responses and requests are handled in a single go routine, to avoid deadlocks, the response channel is purged before processing a request, and the deltaMuxedResponses channel is created with a buffer size of 2x the number of types.

arkodg · 2023-08-17T01:09:51Z

ptal @valerian-roche @alecholmez

valerian-roche

This is only an initial review of the PR. I need to spend more time on the impact of this new shared channel.
My initial analysis is that there is a potential deadlock case as a watch can have queued a response prior to being cancelled. In some edge cases when the client has sent a lot of requests in a short sequence, I believe this channel could end up overflowing the muxed one and then itself. The previous model guarantees that the muxed channel being filled does not block other processes (as they each dequeue within their own goroutine), but here it is no longer the case

pkg/server/delta/v3/server.go

pkg/server/delta/v3/watches.go

pkg/cache/v3/simple.go

pkg/server/delta/v3/server.go

valerian-roche · 2023-08-29T02:01:51Z

pkg/server/delta/v3/server.go

-			watch.responses = make(chan cache.DeltaResponse, 1)
+			if ordered {
+				// Use the shared channel to keep the order of responses.
+				watch.UseSharedResponseChan(watches.deltaMuxedResponses)


I globally like the state of this PR, but I think that there is an issue with how the channel is buffered.
This channel will take potentially multiple responses for each type (e.g. if a cache update triggers while a new request is retrieved and tries to enqueue in the same goroutine). As the channel is only buffered for the nb of types this will deadlock
The issue is that theoretically there could be more than 2 responses per type, as the request/response channels are in a select (and therefore are not ordered). In this case even buffering for two times would not guarantee to not deadlock
I think this can be solved if it is guaranteed that enqueued responses would be processed prior to new requests, but I'm not sure that'd be simple
Another possibility is to, like for the sotw, purge the response queue prior to processing a request. In this case I believe buffering with 2x would be enough to guarantee there is no deadlock. I'm honestly not a huge fan as I think it makes the state machine more complex, but it might be okay
@alecholmez do you think this would cover this issue?

@valerian-roche If we handle the responses and requests in two different go routines, will that solve the deadlock problem?

@valerian-roche I tried to follow your suggestion to guarantee that the responses ar processed prior to new requests. Could you please take another look?

pkg/server/delta/v3/server.go

zhaohuabing · 2023-08-29T11:30:34Z

Some tests failed because of this new change. If this approach is acceptable, I'll go on and fix them.

Signed-off-by: huabing zhao <zhaohuabing@gmail.com>

valerian-roche

LGTM, thanks a lot for your patience on this!

Were you able to test this on a running environments? I might be able to test it on some of our test systems next week if you cannot easily run it

Can you expand in the description describing a bit the implementation and the reasoning behind if people need to take a look later on?

zhaohuabing · 2023-09-21T01:42:53Z

LGTM, thanks a lot for your patience on this!

Were you able to test this on a running environments? I might be able to test it on some of our test systems next week if you cannot easily run it

Can you expand in the description describing a bit the implementation and the reasoning behind if people need to take a look later on?

@valerian-roche Thanks. I can test it since I have already encountered this issue in my local development environment. But it may take a few days since I'll be attending KubeCon CN next week and I'm preparing on my talk.

Description updated with reasons and hows.

zhaohuabing · 2023-09-25T07:17:24Z

@valerian-roche Tested in my local development environment.

arkodg · 2023-10-09T18:44:39Z

thanks @zhaohuabing for fixing this, can we get this merged ? Envoy Gateway (based on delta xDS) needs this

This import brings in envoyproxy/go-control-plane#752 which should ensure resources pushed via delta ADS follow a specific order to ensure the traffic chain doesnt transiently break Signed-off-by: Arko Dasgupta <arko@tetrate.io>

* Bring in go-control-plane fixes This import brings in envoyproxy/go-control-plane#752 which should ensure resources pushed via delta ADS follow a specific order to ensure the traffic chain doesnt transiently break Signed-off-by: Arko Dasgupta <arko@tetrate.io> * rerun go generate Signed-off-by: Arko Dasgupta <arko@tetrate.io> --------- Signed-off-by: Arko Dasgupta <arko@tetrate.io>

zhaohuabing force-pushed the ordered-delta-watch branch 2 times, most recently from 5c21349 to ef293f8 Compare August 14, 2023 10:44

zhaohuabing marked this pull request as draft August 14, 2023 10:49

zhaohuabing changed the title ~~Enable ordered responses for delta watches~~ Enable ordered responses for ADS delta watches Aug 14, 2023

zhaohuabing marked this pull request as ready for review August 15, 2023 07:13

zhaohuabing marked this pull request as draft August 15, 2023 07:13

zhaohuabing force-pushed the ordered-delta-watch branch 4 times, most recently from 44d23ba to 487dc31 Compare August 15, 2023 11:25

zhaohuabing marked this pull request as ready for review August 15, 2023 11:32

zhaohuabing force-pushed the ordered-delta-watch branch from 487dc31 to cd45f84 Compare August 15, 2023 11:33

zhaohuabing marked this pull request as draft August 16, 2023 04:22

zhaohuabing force-pushed the ordered-delta-watch branch 3 times, most recently from d66fefa to 6adf01b Compare August 16, 2023 04:54

zhaohuabing marked this pull request as ready for review August 16, 2023 05:20

zhaohuabing force-pushed the ordered-delta-watch branch from 6adf01b to 71d67b5 Compare August 17, 2023 02:13

valerian-roche reviewed Aug 18, 2023

View reviewed changes

zhaohuabing force-pushed the ordered-delta-watch branch 2 times, most recently from f45c59c to 52998ad Compare August 21, 2023 02:58

zhaohuabing requested a review from valerian-roche August 21, 2023 03:00

zhaohuabing force-pushed the ordered-delta-watch branch 2 times, most recently from 3895960 to 1fd388b Compare August 22, 2023 04:01

valerian-roche reviewed Aug 29, 2023

View reviewed changes

zhaohuabing requested a review from valerian-roche August 29, 2023 10:10

zhaohuabing force-pushed the ordered-delta-watch branch from c45ed63 to f1a3989 Compare August 29, 2023 11:27

zhaohuabing force-pushed the ordered-delta-watch branch from a354199 to 5466e1a Compare September 8, 2023 10:03

zhaohuabing force-pushed the ordered-delta-watch branch from d590c37 to b0e14ea Compare September 13, 2023 07:12

purge response channel before processing a request to avoid deadlock

34cb745

Signed-off-by: huabing zhao <zhaohuabing@gmail.com>

zhaohuabing force-pushed the ordered-delta-watch branch from b0e14ea to 34cb745 Compare September 13, 2023 07:22

zhaohuabing added 2 commits September 14, 2023 20:52

Merge branch 'main' into ordered-delta-watch

8e58be9

Merge branch 'main' into ordered-delta-watch

137a9c2

zhaohuabing requested a review from valerian-roche September 16, 2023 01:44

Merge branch 'main' into ordered-delta-watch

b8be038

valerian-roche approved these changes Sep 21, 2023

View reviewed changes

Merge branch 'main' into ordered-delta-watch

3855214

zhaohuabing requested a review from valerian-roche September 21, 2023 01:44

Merge branch 'main' into ordered-delta-watch

48daf93

zhaohuabing added 2 commits September 27, 2023 07:31

Merge branch 'main' into ordered-delta-watch

6d7ee24

Merge branch 'main' into ordered-delta-watch

c9c07b9

zhaohuabing and others added 2 commits October 9, 2023 19:24

Merge branch 'main' into ordered-delta-watch

8944f36

Merge branch 'main' into ordered-delta-watch

f505910

valerian-roche approved these changes Oct 10, 2023

View reviewed changes

Merge branch 'main' into ordered-delta-watch

50441cb

valerian-roche merged commit b652489 into envoyproxy:main Oct 10, 2023
3 checks passed

arkodg mentioned this pull request Oct 10, 2023

Bring in go-control-plane fixes envoyproxy/gateway#1949

Merged

zhaohuabing deleted the ordered-delta-watch branch October 10, 2023 23:30

jparklab mentioned this pull request Jan 18, 2024

ClusterLoadAssignment IPs become empty after updating Cluster configuration with the same EDS ServiceName envoyproxy/envoy#31535

Closed

lobkovilya mentioned this pull request Feb 7, 2024

Deadlock in Delta XDS if number of resources is more than 10 #875

Closed

arkodg mentioned this pull request Feb 16, 2024

envoyproxy needs restart before securitypolicy takes action envoyproxy/gateway#2521

Closed

lobkovilya mentioned this pull request Mar 8, 2024

KDS extra errors on fresh startup kumahq/kuma#8505

Open

zhaohuabing mentioned this pull request Sep 30, 2024

flaky: FAIL TestE2E/OIDC http_route_with_oidc_authentication envoyproxy/gateway#3898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable ordered responses for ADS delta watches #752

Enable ordered responses for ADS delta watches #752

zhaohuabing commented Aug 14, 2023 •

edited

Loading

arkodg commented Aug 17, 2023

valerian-roche left a comment

valerian-roche Aug 29, 2023

zhaohuabing Aug 29, 2023 •

edited

Loading

zhaohuabing Sep 8, 2023 •

edited

Loading

zhaohuabing commented Aug 29, 2023

valerian-roche left a comment •

edited

Loading

zhaohuabing commented Sep 21, 2023

zhaohuabing commented Sep 25, 2023

arkodg commented Oct 9, 2023

Enable ordered responses for ADS delta watches #752

Enable ordered responses for ADS delta watches #752

Conversation

zhaohuabing commented Aug 14, 2023 • edited Loading

Why we need this?

How this PR fix this issue?

arkodg commented Aug 17, 2023

valerian-roche left a comment

Choose a reason for hiding this comment

valerian-roche Aug 29, 2023

Choose a reason for hiding this comment

zhaohuabing Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

zhaohuabing Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

zhaohuabing commented Aug 29, 2023

valerian-roche left a comment • edited Loading

Choose a reason for hiding this comment

zhaohuabing commented Sep 21, 2023

zhaohuabing commented Sep 25, 2023

arkodg commented Oct 9, 2023

zhaohuabing commented Aug 14, 2023 •

edited

Loading

zhaohuabing Aug 29, 2023 •

edited

Loading

zhaohuabing Sep 8, 2023 •

edited

Loading

valerian-roche left a comment •

edited

Loading