Cache metric names provided by KEDA Metrics Server #2156

zroubalik · 2021-10-05T11:38:04Z

Proposal

Kubernetes API Server is asking for all available metrics that are provided by the KEDA Metrics Server. Internally we are doing a round trip across all ScaledObjects to compile the list of used metrics:

keda/pkg/provider/provider.go

Lines 139 to 140 in a223a0e

    
           // ListAllExternalMetrics returns the supported external metrics for this provider 
        
           func (p *KedaProvider) ListAllExternalMetrics() []provider.ExternalMetricInfo {

This code has been on my radar for quite some time. We should cache the list of metrics in the metrics server and only update it if a ScaledObject has been added/updated/removed. But it is a little bit tricky because metrics server doesn't watch ScaledObjects (Operator does that in the reconcile loop) so it doesn't have the knowledge.

I have been thinking about this for some time and the best solution is probably one of these:

Add a simple controller to Metrics Server, it will watch ScaledObjects and populate metrics names cache appropriately.
Operator could maintain a list of all used metrics and update this list as ScaledObjects are being added/removed/updated. Then the Operator dumps this list of metrics into a ConfigMap which will be mounted to metrics server. Metrics Server will populate the metrics cache from this ConfigMap (will check the timestamp of the mounted CM in order to have up to date cache).

I am inclined toward solution 1.

Use-Case

It should improve perfomance, because this will reduce the number of calls to k8s API Server.

coderanger · 2021-10-05T17:46:30Z

+1 to the idea but I think the implementation suggested would be more brittle over time. I would suggest the better fix is we merge the controller and metrics server into one daemon.

zroubalik · 2021-10-05T18:44:12Z

Merging controller and metrics server into one daemon could bring problems with potential HA setup of metrics server (some environments could have requirements for HA of k8s related services).

coderanger · 2021-10-05T18:55:46Z

Controller-runtime supports non-leader-elected runnables just fine, so not sure why that would be a problem.

zroubalik · 2021-10-05T19:57:02Z

Let's say we merge controller + metrics server into one daemon and available metrics are populated from the controller. And we have HA KEDA setup with 2 deployments:

Kubernetes api server will ask KEDA Metrics server for available metrics -> the deployment with running controller will properly list and return them, but what if the request will be load balanced to the deployment with the controller that hasn't been selected as a leader? In this case we would have to manage the cache of metrics somehow.

coderanger · 2021-10-05T20:08:14Z

Yes? But that can use the same underlying cache so we don't duplicate effort.

zroubalik · 2021-10-05T20:13:59Z

We will have 2 Deployments (HA), where do you see the cache that's accessible from all KEDA Deployments?

coderanger · 2021-10-05T20:40:23Z

It would be in-memory? I think we're talking about different things or something else to make these questions confusing on both sides :)

arschles · 2021-10-06T16:51:59Z

@coderanger could you go into some more detail why the original proposed solution would be brittle? I think it's worth talking about that because figuring out a way to have metrics reliably communicate with the operator would let us keep the two processes separate, and that would likely reduce the amount of work needed to get this done.

zroubalik · 2021-10-06T18:09:27Z

@coderanger yeah probably a small misunderstanding from both sides :)

+1 to @arschles's ask and maybe if you can talk a little more about the in-memory proposal, that will allow HA (multiple K8s Deployments).

yquansah · 2021-10-19T23:41:11Z

@coderanger I think having a unified cache accessible to all deployments will just add more architecture and overall complexity

@zroubalik how often do you forsee that the operator will dump contents into a ConfigMap? (with your proposed solution)

zroubalik · 2021-10-20T07:17:15Z

@yquansah I am have been still thinking about the best option, currently I am more convinced that the best option would be to actually add a simple controller to Metrics Server, it will watch ScaledObjects and populate metrics names cache appropriately.

Wrt you question, it should do it with each change on ScaledObject.

arschles · 2021-10-20T21:52:14Z

@zroubalik I believe that's a good solution as well, but we should think about failure modes if we do it. the big one on my mind right now is "drift" between metrics server and operator:

what happens if they don't have the same set of ScaledObjects?
can we / how do we ensure they converge on the updated set?
can we give operators tools to inspect & measure drift at a given point in time?

zroubalik · 2021-11-03T17:51:15Z

@arschles those are valid concerns, but I don't think that the list of metrics is that critical. It is just informative for k8s server, so in worst case it would ask for unknown metric, which we should handle in metric server.

zroubalik added needs-discussion feature-request All issues for new features that have not been committed to labels Oct 5, 2021

zroubalik mentioned this issue Oct 5, 2021

keda-operator spamming apiserver with requests for scaledobjects #1937

Closed

zroubalik self-assigned this Nov 3, 2021

zroubalik mentioned this issue Nov 3, 2021

Add ScalersCache to reuse scalers unless they need changing #2187

Merged

3 tasks

zroubalik mentioned this issue Nov 11, 2021

Cache metric names provided by KEDA Metrics Server #2279

Merged

2 tasks

zroubalik changed the title ~~Cache metrics provided by KEDA Metrics Server~~ Cache metric names provided by KEDA Metrics Server Nov 12, 2021

zroubalik closed this as completed in #2279 Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache metric names provided by KEDA Metrics Server #2156

Cache metric names provided by KEDA Metrics Server #2156

zroubalik commented Oct 5, 2021 •

edited

Loading

coderanger commented Oct 5, 2021 •

edited

Loading

zroubalik commented Oct 5, 2021

coderanger commented Oct 5, 2021

zroubalik commented Oct 5, 2021 •

edited

Loading

coderanger commented Oct 5, 2021

zroubalik commented Oct 5, 2021

coderanger commented Oct 5, 2021

arschles commented Oct 6, 2021

zroubalik commented Oct 6, 2021

yquansah commented Oct 19, 2021

zroubalik commented Oct 20, 2021

arschles commented Oct 20, 2021

zroubalik commented Nov 3, 2021

Cache metric names provided by KEDA Metrics Server #2156

Cache metric names provided by KEDA Metrics Server #2156

Comments

zroubalik commented Oct 5, 2021 • edited Loading

Proposal

Use-Case

coderanger commented Oct 5, 2021 • edited Loading

zroubalik commented Oct 5, 2021

coderanger commented Oct 5, 2021

zroubalik commented Oct 5, 2021 • edited Loading

coderanger commented Oct 5, 2021

zroubalik commented Oct 5, 2021

coderanger commented Oct 5, 2021

arschles commented Oct 6, 2021

zroubalik commented Oct 6, 2021

yquansah commented Oct 19, 2021

zroubalik commented Oct 20, 2021

arschles commented Oct 20, 2021

zroubalik commented Nov 3, 2021

zroubalik commented Oct 5, 2021 •

edited

Loading

coderanger commented Oct 5, 2021 •

edited

Loading

zroubalik commented Oct 5, 2021 •

edited

Loading