-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable shard-level concurrent reconciliation #1124
Configurable shard-level concurrent reconciliation #1124
Conversation
Here's an example operator config change, CRD definition, and logs showing the flow of reconciliation using this new mode: Operator file config change:
CRD example (2x4x2)
Reconcile wall time via get pods view (simple):
Full event log via kubectl describe chi
|
and: fyi @sunsingerus . |
Thank you, @zcross , nice work! We are busy with 0.21.0 release that should be complete in a couple of weeks, it will go to the next one. |
Thanks for that update! I'll avoid adding any chatter to the PRs then, unless I discover any new open questions or run into any problems while doing some local manual testing / e2e testing. |
Another example: one cluster (for brevity), but this time 8 shards x 2 replicas. Still using 100% concurrency percent and 4 goroutines:
|
Update: I'm working on a change to support overriding the k8s rate limit QPS and burst parameters, basically necessary to get around the above. Currently pondering how to do it if not through CHOP YAML config (due to the ... edit: see newest commit for simple passage of this via env vars, which avoided refactoring of the initialization of |
However, some celebration worthy early results: I was just able to test this out in a real environment instead of minikube and reconciled a 32-replica cluster in about 20 minutes instead of the several hours that we typically observe! Needless to say, my team is pretty excited about the potential value this will have for us (especially in incident response) |
Just resolved (trivial, import lines) merge conflicts after doing the same in #1119 |
Signed-off-by: Zach Cross <zcross@chronosphere.io>
Just rebased off latest |
Apologies for not offering up a bunch of new e2e test coverage. I'd like to, but:
For example, I'd like to at least have one test case do something like "bring up a n>2 shards cluster results in a healthy cluster" with the new By the way: since the other 2 PRs landed – what are the chances this makes it into Thanks! |
Thanks @sunsingerus ! |
Background: #1109
Related PR(s):
This PR adds a configurable level of concurrency to the operator for the purpose of CHI reconciliation, with concurrency being applied at the shards level.
Key design decisions:
Important items to consider before making a Pull Request
Please check items PR complies to:
next-release
branch, not intomaster
branch1. More info