-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't delete non-existent resources in non-wildcard mode in delta xds #488
Don't delete non-existent resources in non-wildcard mode in delta xds #488
Conversation
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
…into incremental-linear
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
…into incremental-linear
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Co-authored-by: Alec Holmes <alec.holmes@greymatter.io> Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
…into incremental-linear
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
…into incremental-linear
…ent and doesn't exist on the server in delta xds Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
…into incremental-linear
I spend some time trying to debug failing delta ADS integration tests. I think they are failing because of broken ADS ordering (in particular, cluster updates are send before listener updates, which is causing the following error |
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Changing ADS ordering didn't help. No mater how we order resources we will always have some races. Here are 2 cases that I checked:
The only solution I can think of is to skip deleting non-existent routes (they are requested in non-wildcard mode). They still will be removed after envoy receives listener updates and will figure out that nobody is referencing old routes anymore. |
This PR should be ready for review, cc @snowp @alecholmez |
@s-matyukevich I'm not sure if you've seen this PR: #461 but maybe the ordering of the resources issue can be fixed by porting this over to delta |
@snowp your input might be valuable here, I remember us having a conversation on this bit of the logic but I can't remember the decisions as to why we decided to not keep the blank string entry |
Yeah, I saw this PR and implemented ordering. It didn't. help me to fix integration tests because I was also trying to implement this part of the spec (sent DeltaDiscoverResponse with empty Resource field for non-existent resources) In this case envoy indeed deleted the resource immediately, which was causing issue mo matter how we order resources. After I updated the code and stopped sending anything for non-existent resources requested in non-wildcard mode, everything started working as expected. (We just wait for envoy to unsubscribed from resources that are no longer references by any other resource) This also works ok with any resource ordering, so I deleted the ordering code to keep this PR focused. I am not sure whether and how we are supposed to implement the above mentioned part of the spec. |
Haven't read the full thread but this doesn't seem right:
Whether or not a resource is subscribed to should not be up to the server at all, the client knows what resources it wants. |
I don't recall what the semantics of empty string used to mean, but we should definitely be keeping track of resources that are currently being subscribed to but they don't have a current value. |
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
I updated the code accordingly to this comment. Now we keep the subscription active for the non-existent resources while still populating |
@snowp @alecholmez I tested the latest version of this PR in our staging environment and verified that it indeed fixes the issue. |
Cool I think this satisfies everything we need. @snowp this look good to you? I have nothing else to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comment nitpicks. Everything else looks good though
Co-authored-by: Alec Holmes <alec.holmes@greymatter.io> Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
c1a4415
to
40f16db
Compare
Hmm it seems like envoy is seg faulting.. |
…into incremental-linear
I am pretty sure something is wrong with the latest envoy dev build. |
I've just notified the Envoy team about the segfault |
Thanks! This #493 also could be helpful (Basically, I did a binary search on recent envoy commits and figured out the one after which the build starts failing) I was also trying to use |
Awesome thank you @s-matyukevich. Would you also mind alerting the security team as well? Here's a doc for how to do so |
Sure, done. |
…into incremental-linear
@alecholmez This now passes CI, anything else needs to be done before we can merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last 2 things! I'll approve and merge once these are addressed
pkg/cache/v3/delta.go
Outdated
@@ -56,13 +56,19 @@ func createDeltaResponse(ctx context.Context, req *DeltaRequest, state stream.St | |||
filtered = append(filtered, r) | |||
} | |||
nextVersionMap[name] = nextVersion | |||
} else { | |||
// we track non-existent resources for non-wildcard streams until the client explicitly unsubscribes from them. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this space?
pkg/cache/v3/delta.go
Outdated
@@ -56,13 +56,19 @@ func createDeltaResponse(ctx context.Context, req *DeltaRequest, state stream.St | |||
filtered = append(filtered, r) | |||
} | |||
nextVersionMap[name] = nextVersion | |||
} else { | |||
// we track non-existent resources for non-wildcard streams until the client explicitly unsubscribes from them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// we track non-existent resources for non-wildcard streams until the client explicitly unsubscribes from them. | |
// We track non-existent resources for non-wildcard streams until the client explicitly unsubscribes from them. | |
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
Sure, done. |
build/integration.sh
Outdated
@@ -39,6 +39,7 @@ ENVOY_PID=$! | |||
function cleanup() { | |||
kill ${ENVOY_PID} ${UPSTREAM_PID} | |||
wait ${ENVOY_PID} ${UPSTREAM_PID} 2> /dev/null || true | |||
cat ${ENVOY_LOG} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah can you remove this cat? We don't want to pollute this log here, i think this is just leftover from debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I forgot about this, this is fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you!
Signed-off-by: Sergey Matyukevich <s.matyukevich@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for all the cooperation @s-matyukevich 😄
Before this PR we used to always add resource to the
RemovedResources
field of theDeltaDiscoveryResponse
if it doesn't exist on the server. This causes problems in the following scenario:DeltaDiscoveryRequest
with typeClusterLoadAssignment
andResourceNamesSubscribe
field set to[]string{"foo"}
DeltaDiscoveryResponse
withRemovedResources
name set to[]string{"foo"}
, which instructs envoy to unsubscribe from this resource.This PR fixes the issue by preventing resources from being deleted for non-wildcard requests. Instead, now we are going to wait for the client to unsubscribe from resources by itself. This works fine if the client always uses wildcard requests to pull initial resources and it also uses non-wildcard requests only to pull dependent resources (This is the case with envoy: it always uses wildcard requests to pull clusters and listeners and then non-wildcard requests to pull all other dependent resources)