Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(xds): make sure ADS are ordered #11579

Merged
merged 10 commits into from
Sep 27, 2024

Conversation

jakubdyszkiewicz
Copy link
Contributor

@jakubdyszkiewicz jakubdyszkiewicz commented Sep 26, 2024

Checklist prior to review

Use ordered ADS. We noticed that in edge case we can hit out of order EDS which results in cluster being never updated.

This PR updates go-control-plane to the newest version and enables ordered ADS functionality. At this moment we cannot rely on upstream go-control-plane because

We now can ditch "resource warming forcer". Because of ordered responses, we can just bump the EDS version whenever we update CDS, go-control-plane will handle the rest.
Forcer might have been a culprit here as well. After we just used a new go-control-plane and kept the forcer, we hit one CI fail with warming clusters. After we got rid of it, 3 full runs passed.

TODO: Remember to also change the replace in Kong Mesh after merge.

  • Link to relevant issue as well as docs and UI issues --
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as a image registry) and it will work on Windows, system specific functions like syscall.Mkfifo have equivalent implementation on the other OS --
  • Tests (Unit test, E2E tests, manual test on universal and k8s) --
    • Don't forget ci/ labels to run additional/fewer tests
  • Do you need to update UPGRADE.md? --
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label) --

@jakubdyszkiewicz jakubdyszkiewicz added the ci/run-full-matrix PR: Runs all possible e2e test combination (expensive use carefully) label Sep 26, 2024
@jakubdyszkiewicz
Copy link
Contributor Author

jakubdyszkiewicz commented Sep 26, 2024

on universal suite

2024-09-26T13:16:04.464Z	INFO	xds.dp-lifecycle	waiting for deregister proxy	{"proxyType": "dataplane", "proxyKey": {"Mesh":"dp-auth","Name":"dp-01"}, "streamID": 108, "waitFor": "0s"}
2024-09-26T13:16:04.464Z	INFO	xds.dp-lifecycle	deregister proxy	{"proxyType": "dataplane", "proxyKey": {"Mesh":"dp-auth","Name":"dp-01"}, "streamID": 108}
2024-09-26T13:16:04.475Z	INFO	xds.reconcile	config has changed	{"proxyName": "egress", "mesh": "", "versions": ["78e4d42f-5c92-426e-8b37-2302991502a0", "5cdbd604-2150-4c9f-a582-0d7ec8683129"]}
2024-09-26T13:16:04.591Z	INFO	xds.status-tracker	proxy disconnected	{"streamID": 111, "proxyName": "", "mesh": "", "subscriptionID": "bdc84fa3-ffc6-4411-9f95-f62c47e37165"}
panic: send on closed channel

goroutine 10572 [running]:
github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3.(*server).process.func1()
	github.com/envoyproxy/go-control-plane@v0.12.0/pkg/server/sotw/v3/xds.go:100 +0x1b
created by github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3.(*server).process in goroutine 10568
	github.com/envoyproxy/go-control-plane@v0.12.0/pkg/server/sotw/v3/xds.go:99 +0xaff

stderr:

edit: fixed in our fork of go control plane

@jakubdyszkiewicz jakubdyszkiewicz marked this pull request as ready for review September 27, 2024 13:13
@jakubdyszkiewicz jakubdyszkiewicz requested a review from a team as a code owner September 27, 2024 13:13
@jakubdyszkiewicz jakubdyszkiewicz requested review from slonka and lobkovilya and removed request for a team September 27, 2024 13:13
pkg/xds/server/v3/components.go Outdated Show resolved Hide resolved
@jakubdyszkiewicz jakubdyszkiewicz merged commit 6f98454 into kumahq:master Sep 27, 2024
33 checks passed
@jakubdyszkiewicz jakubdyszkiewicz deleted the fix/ordered-xds branch September 27, 2024 14:37
@jakubdyszkiewicz
Copy link
Contributor Author

While this can fix a potential Cluster stuck in warming state, we are not sure yet if we want to backport it. Aside of ordered ADS it includes all the changes of upgraded go-control-plane. We want to first try it more and then potentially backport to all supported versions.

@jijiechen jijiechen changed the title fix(xds): ordered ADS fix(xds): make sure ADS are ordered Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/run-full-matrix PR: Runs all possible e2e test combination (expensive use carefully)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants