-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache metric names provided by KEDA Metrics Server #2156
Comments
+1 to the idea but I think the implementation suggested would be more brittle over time. I would suggest the better fix is we merge the controller and metrics server into one daemon. |
Merging controller and metrics server into one daemon could bring problems with potential HA setup of metrics server (some environments could have requirements for HA of k8s related services). |
Controller-runtime supports non-leader-elected runnables just fine, so not sure why that would be a problem. |
Let's say we merge controller + metrics server into one daemon and available metrics are populated from the controller. And we have HA KEDA setup with 2 deployments: Kubernetes api server will ask KEDA Metrics server for available metrics -> the deployment with running controller will properly list and return them, but what if the request will be load balanced to the deployment with the controller that hasn't been selected as a leader? In this case we would have to manage the cache of metrics somehow. |
Yes? But that can use the same underlying cache so we don't duplicate effort. |
We will have 2 Deployments (HA), where do you see the cache that's accessible from all KEDA Deployments? |
It would be in-memory? I think we're talking about different things or something else to make these questions confusing on both sides :) |
@coderanger could you go into some more detail why the original proposed solution would be brittle? I think it's worth talking about that because figuring out a way to have metrics reliably communicate with the operator would let us keep the two processes separate, and that would likely reduce the amount of work needed to get this done. |
@coderanger yeah probably a small misunderstanding from both sides :) +1 to @arschles's ask and maybe if you can talk a little more about the in-memory proposal, that will allow HA (multiple K8s Deployments). |
@coderanger I think having a unified cache accessible to all deployments will just add more architecture and overall complexity @zroubalik how often do you forsee that the operator will dump contents into a ConfigMap? (with your proposed solution) |
@yquansah I am have been still thinking about the best option, currently I am more convinced that the best option would be to actually add a simple controller to Metrics Server, it will watch ScaledObjects and populate metrics names cache appropriately. Wrt you question, it should do it with each change on ScaledObject. |
@zroubalik I believe that's a good solution as well, but we should think about failure modes if we do it. the big one on my mind right now is "drift" between metrics server and operator:
|
@arschles those are valid concerns, but I don't think that the list of metrics is that critical. It is just informative for k8s server, so in worst case it would ask for unknown metric, which we should handle in metric server. |
Proposal
Kubernetes API Server is asking for all available metrics that are provided by the KEDA Metrics Server. Internally we are doing a round trip across all ScaledObjects to compile the list of used metrics:
keda/pkg/provider/provider.go
Lines 139 to 140 in a223a0e
This code has been on my radar for quite some time. We should cache the list of metrics in the metrics server and only update it if a ScaledObject has been added/updated/removed. But it is a little bit tricky because metrics server doesn't watch ScaledObjects (Operator does that in the reconcile loop) so it doesn't have the knowledge.
I have been thinking about this for some time and the best solution is probably one of these:
I am inclined toward solution 1.
Use-Case
It should improve perfomance, because this will reduce the number of calls to k8s API Server.
The text was updated successfully, but these errors were encountered: