Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterLoadAssignment IPs become empty after updating Cluster configuration with the same EDS ServiceName #31535

Closed
maoyutao opened this issue Dec 27, 2023 · 7 comments
Labels
bug stale stalebot believes this issue/PR has not been touched recently

Comments

@maoyutao
Copy link

maoyutao commented Dec 27, 2023

I am using Envoy to fetch Cluster and EDS configurations from an ADS server. I've noticed an issue where, after updating the Cluster configuration (while keeping the same eds_cluster_config.service_name), the IPs in the corresponding ClusterLoadAssignment become empty, even though the content of the ClusterLoadAssignment has not changed.

Steps to reproduce:

Fetch Cluster and EDS configurations from the ADS server.
Update the Cluster configuration, keeping the same eds_cluster_config.service_name.
Observe that the IPs in the corresponding ClusterLoadAssignment become empty.

Expected behavior:

Since the eds_cluster_config.service_name and the content of the ClusterLoadAssignment have not changed, I would expect the IPs in the ClusterLoadAssignment to remain the same after updating the Cluster configuration.

Additional information:

Envoy version: v1.28
Go-control-plane: v0.11.1
ADS API Type: DELTA_GRPC

I would appreciate any help in understanding why this is happening and how to prevent it. Thank you.

@maoyutao maoyutao added bug triage Issue requires triage labels Dec 27, 2023
@maoyutao
Copy link
Author

image The document states that Envoy must receive a ClusterLoadAssignment response, but xDS will not push resources with consistent versions.

xds log:

2023/12/27 23:07:33 management server listening on 18000
2023/12/27 23:07:33 delta stream 1 open for
2023/12/27 23:07:33 OnStreamDeltaRequest req {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0001b5120} sizeCache:0 unknownFields:[] Node: TypeUrl:type.googleapis.com/envoy.config.cluster.v3.Cluster ResourceNamesSubscribe:[] ResourceNamesUnsubscribe:[] ResourceLocatorsSubscribe:[] ResourceLocatorsUnsubscribe:[] InitialResourceVersions:map[example_proxy_cluster-0:1e313c645ad7b6efbcdaacd86a71975a55338de6076c18c082865b802118be65 example_proxy_cluster-1:0ec9054fe7916a0bb0801279ac5b533f87b166e6e579ea119eaed968ca77a1a8 example_proxy_cluster-2:6b756e2284af2d8d26483ac85291f1f05f04dcf957e2a5cb4c312aaa5e0ae862] ResponseNonce: ErrorDetail:}
2023/12/27 23:07:33 node: test-id, sending delta response for typeURL type.googleapis.com/envoy.config.cluster.v3.Cluster with resources: [example_proxy_cluster-2 example_proxy_cluster-0 example_proxy_cluster-1] removed resources: [] with wildcard: true
2023/12/27 23:07:33 OnStreamDeltaResponse req {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0001b5120} sizeCache:0 unknownFields:[] Node: TypeUrl:type.googleapis.com/envoy.config.cluster.v3.Cluster ResourceNamesSubscribe:[] ResourceNamesUnsubscribe:[] ResourceLocatorsSubscribe:[] ResourceLocatorsUnsubscribe:[] InitialResourceVersions:map[example_proxy_cluster-0:1e313c645ad7b6efbcdaacd86a71975a55338de6076c18c082865b802118be65 example_proxy_cluster-1:0ec9054fe7916a0bb0801279ac5b533f87b166e6e579ea119eaed968ca77a1a8 example_proxy_cluster-2:6b756e2284af2d8d26483ac85291f1f05f04dcf957e2a5cb4c312aaa5e0ae862] ResponseNonce: ErrorDetail:} resp system_version_info:"1703689653" resources:{name:"example_proxy_cluster-2" version:"5e9338d2171a87d39618efa969134a33f5c63849b298695d4cb2f6a84e57693a" resource:{[type.googleapis.com/envoy.config.cluster.v3.Cluster]:{name:"example_proxy_cluster-2" type:EDS eds_cluster_config:{eds_config:{ads:{}} service_name:"example_proxy_cluster-2"} connect_timeout:{seconds:5} circuit_breakers:{thresholds:{max_connections:{value:10001} max_pending_requests:{value:10001} max_requests:{value:10001}}} dns_lookup_family:V4_ONLY}}} resources:{name:"example_proxy_cluster-0" version:"e769e07a71ed71c55aaf9a2a182668ee4b7c67cebb0fd7444d11b60a20c230f4" resource:{[type.googleapis.com/envoy.config.cluster.v3.Cluster]:{name:"example_proxy_cluster-0" type:EDS eds_cluster_config:{eds_config:{ads:{}} service_name:"example_proxy_cluster-0"} connect_timeout:{seconds:5} circuit_breakers:{thresholds:{max_connections:{value:10001} max_pending_requests:{value:10001} max_requests:{value:10001}}} dns_lookup_family:V4_ONLY}}} resources:{name:"example_proxy_cluster-1" version:"c9cf2375a5fa827303d51c002b24fd1c23c00dfab695538f3d59b5ed17b77af1" resource:{[type.googleapis.com/envoy.config.cluster.v3.Cluster]:{name:"example_proxy_cluster-1" type:EDS eds_cluster_config:{eds_config:{ads:{}} service_name:"example_proxy_cluster-1"} connect_timeout:{seconds:5} circuit_breakers:{thresholds:{max_connections:{value:10001} max_pending_requests:{value:10001} max_requests:{value:10001}}} dns_lookup_family:V4_ONLY}}} type_url:"type.googleapis.com/envoy.config.cluster.v3.Cluster" nonce:"1"
2023/12/27 23:07:33 OnStreamDeltaRequest req {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0001b5120} sizeCache:0 unknownFields:[] Node: TypeUrl:type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment ResourceNamesSubscribe:[example_proxy_cluster-0 example_proxy_cluster-1 example_proxy_cluster-2] ResourceNamesUnsubscribe:[] ResourceLocatorsSubscribe:[] ResourceLocatorsUnsubscribe:[] InitialResourceVersions:map[example_proxy_cluster-0:a5df08f928fd6bb3ae0c263d46efab33d9502b3f4c67c80c9735758018079deb example_proxy_cluster-1:8b4fc18e98340b0d6fadf52195fe7f27ce107614200c048fb8df707d84189b95 example_proxy_cluster-2:ed1888d9fdc3f025ebe825d6bdeb8b7f9acc1f0980a4926f7007f487d8817e42] ResponseNonce: ErrorDetail:}
2023/12/27 23:07:33 open delta watch ID:1 for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment Resources:map[example_proxy_cluster-0:{} example_proxy_cluster-1:{} example_proxy_cluster-2:{}] from nodeID: "test-id", version "1703689653"
2023/12/27 23:07:33 OnStreamDeltaRequest req {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0001b5120} sizeCache:0 unknownFields:[] Node: TypeUrl:type.googleapis.com/envoy.config.route.v3.RouteConfiguration ResourceNamesSubscribe:[route_0] ResourceNamesUnsubscribe:[] ResourceLocatorsSubscribe:[] ResourceLocatorsUnsubscribe:[] InitialResourceVersions:map[] ResponseNonce: ErrorDetail:}
2023/12/27 23:07:33 open delta watch ID:2 for type.googleapis.com/envoy.config.route.v3.RouteConfiguration Resources:map[route_0:{}] from nodeID: "test-id", version "1703689653"
2023/12/27 23:07:48 OnStreamDeltaRequest req {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0001b5120} sizeCache:0 unknownFields:[] Node: TypeUrl:type.googleapis.com/envoy.config.cluster.v3.Cluster ResourceNamesSubscribe:[] ResourceNamesUnsubscribe:[] ResourceLocatorsSubscribe:[] ResourceLocatorsUnsubscribe:[] InitialResourceVersions:map[] ResponseNonce:1 ErrorDetail:}
2023/12/27 23:07:48 open delta watch ID:3 for type.googleapis.com/envoy.config.cluster.v3.Cluster Resources:map[] from nodeID: "test-id", version "1703689653"

@soulxu
Copy link
Member

soulxu commented Dec 28, 2023

cc @adisuissa

@soulxu soulxu removed the triage Issue requires triage label Dec 28, 2023
@jparklab
Copy link
Contributor

jparklab commented Jan 18, 2024

This can happen because go-control-plain v0.11 does not guarantee ordering of responses for delta ADS, and thus, EDS update can be sent before CDS update to the Envoy, and will not resend EDS update when Envoy received CDS update and requested for an update for EDS.
This has been fixed by envoyproxy/go-control-plane#752 and the fix is included in v0.12. @maoyutao can you try upgrading go-control-plane to v0.12 and see if it fixes the issue?

@jparklab
Copy link
Contributor

If you only update CDS configuration and do not update EDS configuration, then upgrading to v0.12 might not fix the problem since go-control-plane does not re-send EDS configuration if there is no change. Then it could be a similar issue as envoyproxy/gateway#2345, and you would need to use eds cache so that Envoy fails back to configurations in the cache.

@adisuissa
Copy link
Contributor

Generally speaking, in Envoy when receiving a cluster-update, a followup EDS update needs to be received from the control-plane.
That said, Envoy has a new feature, currently being tested in prod, so should be somewhat stable - envoy.restart_features.use_eds_cache_for_ads. Setting it allows Envoy to cache a previous EDS assignment, and use it after receiving only a CDS update (there is some timeout waiting for a new EDS assignment, but other than that the behavior should be as you expect).

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 17, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

4 participants