Enable KEDA as a single source of truth on scaler metrics #1281

hacst · 2020-10-22T17:05:42Z

Enable KEDA to always output the scaler metrics it determines through a prometheus metrics endpoint for ScaledObjects and ScaledJobs. This allows it to becomes the single source of truth on metrics on scaling.

Use-Case

A very common requirement in a job queuing/event processing use-case is having metrics available for monitoring and alerting. The metrics allow alerting on work backing up and observing progress of work. Even though KEDA has to know this information for its own work it currently does not make it available in a way that allows it to be consistently used for these purposes.

KEDA 2 provides keda_metrics_adapter_scaler_metrics_value but that is only available for ScaledObjects with active deployments as it is produced by the metrics adapter. This means the metric is not available when no work is pending and more importantly never when ScaledJobs are used.

Currently the only way around this is to have another always running service/exporter which replicates the work the KEDA scalers performed to determine its information and use it to generate the metric. This nullifies a lot of the benefit of using KEDA scalers to begin with as instead of making use of the different scalers you could instead directly attach to this metric ensuring what your monitoring sees is consistent with what KEDA uses for scaling.

KEDA is in the ideal position to produce the metric as the single source of truth. It already needs to have the information and is always running. A user adding a ScaledJob/ScaledObject would naturally make the metric for it appear. This enables self-serving of the metric for developers solely based on KEDA CRDs. Also this way dashboards and alerting are based on the same information that was actually used for scaling.

Specification

Output scaler metrics values also for ScaledJobs
Make scaler metrics presentation consistent for ScaledObjects and ScaledJobs (tags can be used to differentiate)
Provide scaler metrics continuously even if no work is pending (e.g. through KEDA Operator)

The text was updated successfully, but these errors were encountered:

tbickford · 2020-10-22T19:15:25Z

I think having a centralized way of pulling metrics (as opposed to having the metrics adapter and operator pull them separately) would be great but that would require some serious rework I believe.

As an interim step maybe it would make sense to look at what it would take to add Prometheus metrics collecting and exporting to the KEDA Operator first?

hacst · 2020-10-22T20:41:36Z

I have basically no knowledge about keda's internals so I'm definitely not a good candidate to judge implementation complexity.

From what I understand adding a metrics endpoint to the operator would be able to provide ScaledJobs metrics and do so continuously? That would be a valuable thing on its own I think. We could use it.

Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.

Making keda itself also not query redundantly and guarantee it is consistent internally as a benefit definitely could be a longer term thing. I didn't even consider that aspect when writing this FR.

zroubalik · 2020-10-23T11:19:12Z

Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user.

It polls continuously to find out whether the scaler is still active or not, so that should be doable.

Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics

hacst · 2020-10-23T15:04:10Z

Sounds good. If you think this could be a good first issue for someone and no one else already plans to tackle this short term I could try taking a stab at an implementation.

tomkerkhove · 2020-10-26T07:27:53Z

Added it to our roadmap to consider and our standup to discuss.

I like the idea, but we need to think this through if we won't be trying to solve two problems badly. But I'm not opposed to it.

tomkerkhove · 2020-10-26T07:29:39Z

The reason why I'm saying that is that I also maintain https://github.com/tomkerkhove/promitor and it starts with just adding Prometheus support; until somebody asks for another metric system to support, etc etc.

And then we're not even talking about the performance impact if we start having Prometheus et al poke us very often.

So I'm just trying to avoid that KEDA become a metric scraper rather than focussing on app scaling.

(To be clear - I'm definitely not opposed given I have a solution for scraping Azure Monitor metrics)

tomkerkhove · 2020-10-26T07:31:40Z

It polls continuously to find out whether the scaler is still active or not, so that should be doable.

Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics

If we just expose the metrics there, I don't see much of a problem to be honest.

tomkerkhove · 2020-10-26T07:31:50Z

Bottom line - We need a design doc on this I think :)

tomkerkhove · 2020-10-29T16:44:47Z

We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems.

smalgin · 2021-03-04T21:15:20Z

Bumping this.

stale · 2021-10-14T06:48:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

hacst · 2021-10-14T07:02:18Z

Any update on this? I still think it would be a very useful feature.

tomkerkhove · 2021-10-14T08:04:15Z

Not yet but making sure we keep tracking and don't close.

smalgin · 2021-10-14T17:47:26Z

Bump!

tomkerkhove · 2022-05-25T11:16:26Z

So from how I see it we should provide an extension point so that end-users can consume and/or store the metrics that we use. This would also help us get data to eventually start doing predictive activation with (see #197).

My current proposal would be to start emitting all the metrics that we expose through the metric server towards an OpenTelemetry Collector. Optionally, we can add Prometheus as well but I'd standardize on OpenTelemetry going forward instead.

tomkerkhove · 2022-05-30T05:45:23Z

Any thoughts on this @kedacore/keda-maintainers?

JorTurFer · 2022-05-30T06:20:07Z

I'm not 100% sure about this, if we expose the value, anyone in the cluster could go and get the value from there 🤔
Apart from that, I'd add it for the moment to prometheus server due to it's the metrics server that we support right now

tomkerkhove · 2022-05-30T08:44:59Z

I'm not 100% sure about this, if we expose the value, anyone in the cluster could go and get the value from there 🤔

Certainly, but it's up to the team owning the OpenTelemetry Collector to secure it how it needs to be secured. We just push the metrics to them and that's where our responsibilities end.

Apart from that, I'd add it for the moment to prometheus server due to it's the metrics server that we support right now

Prometheus is just one metric platform, pushing to an OpenTelemetry Collector allows end-users to use the metrics wherever they want.

We can definitely add them in Prometheus as well, but I believe OpenTelemetry Collector is more open and should thus be our standard.

zroubalik · 2022-05-30T12:14:12Z

TBH I would love to see both Prom and OpeTelemetry implemented. Ideally one common function that will handle the metrics exposure based on some config.

tomkerkhove · 2022-05-30T12:39:05Z

Both are fine for me, but personally, I think OpenTelemetry is the required piece given it also allows end-users to use Prometheus.

So if OpenTelemetry Collector is supported, then I'm happy!

smalgin · 2022-05-30T15:27:03Z

Yes, the opentele is a 'must have' piece. I agree that the security of the Collector is the responsibility of the Collector impl. Also, it will all run on a secure cluster subnet, so that's something.

hacst · 2022-05-30T16:52:14Z

Nice to see movement on this issue. Personally I would find native prometheus support quite useful. Having open telemetry collector (especially for metrics) is still quite rare. Native prometheus will stay the de facto standard for quite a while and it's also what was supported in keda up to now. I agree that OTEL is great going forward though.

tomkerkhove · 2022-05-31T07:31:21Z

Let's do both then and re-evaluate when we go to KEDA v3.x (if we ever have to, no plans for now)

tomkerkhove · 2022-05-31T07:31:40Z

Anyone who is eager to contribute this?

ilaleksin · 2022-06-02T10:11:59Z

@tomkerkhove may I contribute this feature?

tomkerkhove · 2022-06-02T12:03:38Z

Most definately yes, thanks a ton! Feel free to post design proposal here or a draft PR.

Maybe a "design proposal" on the metric names/approach would be ideal.

JorTurFer · 2022-07-29T08:58:51Z

@ilyalexin
We aim to do a release next week, do you have any update about this topic? No rush at all, we can just postpone this to next release

ilaleksin · 2022-07-29T12:43:15Z

@JorTurFer
I am working on this feature but it will not be completed next week.
Please postpone it to the next release

JorTurFer · 2022-07-29T18:45:41Z

No problem at all! Thanks for the update 😄

hacst added feature-request All issues for new features that have not been committed to needs-discussion labels Oct 22, 2020

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 14, 2021

stale bot removed the stale All issues that are marked as stale due to inactivity label Oct 14, 2021

tomkerkhove added feature All issues for new features that have been committed to and removed feature-request All issues for new features that have not been committed to labels Oct 14, 2021

zroubalik mentioned this issue Apr 19, 2022

Provide a metric for total amount of ScaledObjects #2637

Closed

tomkerkhove added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Apr 19, 2022

tomkerkhove mentioned this issue May 30, 2022

Provide operational metrics in OpenTelemetry Collector #3078

Open

3 tasks

tomkerkhove assigned ilaleksin Jun 2, 2022

tomkerkhove added the opentelemetry label Sep 27, 2022

tomkerkhove mentioned this issue Sep 27, 2022

Integrate KEDA with OpenTelemetry Collector #3698

Open

4 tasks

zroubalik mentioned this issue Nov 28, 2022

Metrics Server: use gRPC connection to get metrics from Operator #3861

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable KEDA as a single source of truth on scaler metrics #1281

Enable KEDA as a single source of truth on scaler metrics #1281

hacst commented Oct 22, 2020

tbickford commented Oct 22, 2020

hacst commented Oct 22, 2020 •

edited

Loading

zroubalik commented Oct 23, 2020

hacst commented Oct 23, 2020

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 26, 2020 •

edited

Loading

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 29, 2020

smalgin commented Mar 4, 2021

stale bot commented Oct 14, 2021

hacst commented Oct 14, 2021

tomkerkhove commented Oct 14, 2021

smalgin commented Oct 14, 2021

tomkerkhove commented May 25, 2022

tomkerkhove commented May 30, 2022

JorTurFer commented May 30, 2022

tomkerkhove commented May 30, 2022

zroubalik commented May 30, 2022

tomkerkhove commented May 30, 2022

smalgin commented May 30, 2022

hacst commented May 30, 2022

tomkerkhove commented May 31, 2022

tomkerkhove commented May 31, 2022

ilaleksin commented Jun 2, 2022

tomkerkhove commented Jun 2, 2022

JorTurFer commented Jul 29, 2022

ilaleksin commented Jul 29, 2022

JorTurFer commented Jul 29, 2022

Enable KEDA as a single source of truth on scaler metrics #1281

Enable KEDA as a single source of truth on scaler metrics #1281

Comments

hacst commented Oct 22, 2020

Use-Case

Specification

tbickford commented Oct 22, 2020

hacst commented Oct 22, 2020 • edited Loading

zroubalik commented Oct 23, 2020

hacst commented Oct 23, 2020

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 26, 2020 • edited Loading

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 26, 2020

tomkerkhove commented Oct 29, 2020

smalgin commented Mar 4, 2021

stale bot commented Oct 14, 2021

hacst commented Oct 14, 2021

tomkerkhove commented Oct 14, 2021

smalgin commented Oct 14, 2021

tomkerkhove commented May 25, 2022

tomkerkhove commented May 30, 2022

JorTurFer commented May 30, 2022

tomkerkhove commented May 30, 2022

zroubalik commented May 30, 2022

tomkerkhove commented May 30, 2022

smalgin commented May 30, 2022

hacst commented May 30, 2022

tomkerkhove commented May 31, 2022

tomkerkhove commented May 31, 2022

ilaleksin commented Jun 2, 2022

tomkerkhove commented Jun 2, 2022

JorTurFer commented Jul 29, 2022

ilaleksin commented Jul 29, 2022

JorTurFer commented Jul 29, 2022

hacst commented Oct 22, 2020 •

edited

Loading

tomkerkhove commented Oct 26, 2020 •

edited

Loading