-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable KEDA as a single source of truth on scaler metrics #1281
Comments
I think having a centralized way of pulling metrics (as opposed to having the metrics adapter and operator pull them separately) would be great but that would require some serious rework I believe. As an interim step maybe it would make sense to look at what it would take to add Prometheus metrics collecting and exporting to the KEDA Operator first? |
I have basically no knowledge about keda's internals so I'm definitely not a good candidate to judge implementation complexity. From what I understand adding a metrics endpoint to the operator would be able to provide ScaledJobs metrics and do so continuously? That would be a valuable thing on its own I think. We could use it. Does the operator also have continuous knowledge of the ScaledObject metrics? Or does it stop polling those once the deployment becomes active (I kinda assumed it would)? If it polls continuously then the metrics in the operator if provided to the outside could already feel pretty SSOT to a user. Making keda itself also not query redundantly and guarantee it is consistent internally as a benefit definitely could be a longer term thing. I didn't even consider that aspect when writing this FR. |
It polls continuously to find out whether the scaler is still active or not, so that should be doable. Btw, KEDA is built on operator-sdk framework, which is based on Kubebuilder. There should be an API to expose metrics, see the docs: https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#metrics |
Sounds good. If you think this could be a good first issue for someone and no one else already plans to tackle this short term I could try taking a stab at an implementation. |
Added it to our roadmap to consider and our standup to discuss. I like the idea, but we need to think this through if we won't be trying to solve two problems badly. But I'm not opposed to it. |
The reason why I'm saying that is that I also maintain https://github.com/tomkerkhove/promitor and it starts with just adding Prometheus support; until somebody asks for another metric system to support, etc etc. And then we're not even talking about the performance impact if we start having Prometheus et al poke us very often. So I'm just trying to avoid that KEDA become a metric scraper rather than focussing on app scaling. (To be clear - I'm definitely not opposed given I have a solution for scraping Azure Monitor metrics) |
If we just expose the metrics there, I don't see much of a problem to be honest. |
Bottom line - We need a design doc on this I think :) |
We've discussed this on our standup and decide to provide metrics for all systems we scrape for metrics so they can be re-used in other systems. |
Bumping this. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Any update on this? I still think it would be a very useful feature. |
Not yet but making sure we keep tracking and don't close. |
Bump! |
So from how I see it we should provide an extension point so that end-users can consume and/or store the metrics that we use. This would also help us get data to eventually start doing predictive activation with (see #197). My current proposal would be to start emitting all the metrics that we expose through the metric server towards an OpenTelemetry Collector. Optionally, we can add Prometheus as well but I'd standardize on OpenTelemetry going forward instead. |
Any thoughts on this @kedacore/keda-maintainers? |
I'm not 100% sure about this, if we expose the value, anyone in the cluster could go and get the value from there 🤔 |
Certainly, but it's up to the team owning the OpenTelemetry Collector to secure it how it needs to be secured. We just push the metrics to them and that's where our responsibilities end.
Prometheus is just one metric platform, pushing to an OpenTelemetry Collector allows end-users to use the metrics wherever they want. We can definitely add them in Prometheus as well, but I believe OpenTelemetry Collector is more open and should thus be our standard. |
TBH I would love to see both Prom and OpeTelemetry implemented. Ideally one common function that will handle the metrics exposure based on some config. |
Both are fine for me, but personally, I think OpenTelemetry is the required piece given it also allows end-users to use Prometheus. So if OpenTelemetry Collector is supported, then I'm happy! |
Yes, the opentele is a 'must have' piece. I agree that the security of the Collector is the responsibility of the Collector impl. Also, it will all run on a secure cluster subnet, so that's something. |
Nice to see movement on this issue. Personally I would find native prometheus support quite useful. Having open telemetry collector (especially for metrics) is still quite rare. Native prometheus will stay the de facto standard for quite a while and it's also what was supported in keda up to now. I agree that OTEL is great going forward though. |
Let's do both then and re-evaluate when we go to KEDA v3.x (if we ever have to, no plans for now) |
Anyone who is eager to contribute this? |
@tomkerkhove may I contribute this feature? |
Most definately yes, thanks a ton! Feel free to post design proposal here or a draft PR. Maybe a "design proposal" on the metric names/approach would be ideal. |
@ilyalexin |
@JorTurFer |
No problem at all! Thanks for the update 😄 |
Enable KEDA to always output the scaler metrics it determines through a prometheus metrics endpoint for
ScaledObjects
andScaledJobs
. This allows it to becomes the single source of truth on metrics on scaling.Use-Case
A very common requirement in a job queuing/event processing use-case is having metrics available for monitoring and alerting. The metrics allow alerting on work backing up and observing progress of work. Even though KEDA has to know this information for its own work it currently does not make it available in a way that allows it to be consistently used for these purposes.
KEDA 2 provides
keda_metrics_adapter_scaler_metrics_value
but that is only available forScaledObjects
with active deployments as it is produced by the metrics adapter. This means the metric is not available when no work is pending and more importantly never whenScaledJobs
are used.Currently the only way around this is to have another always running service/exporter which replicates the work the KEDA scalers performed to determine its information and use it to generate the metric. This nullifies a lot of the benefit of using KEDA scalers to begin with as instead of making use of the different scalers you could instead directly attach to this metric ensuring what your monitoring sees is consistent with what KEDA uses for scaling.
KEDA is in the ideal position to produce the metric as the single source of truth. It already needs to have the information and is always running. A user adding a
ScaledJob
/ScaledObject
would naturally make the metric for it appear. This enables self-serving of the metric for developers solely based on KEDA CRDs. Also this way dashboards and alerting are based on the same information that was actually used for scaling.Specification
ScaledJobs
ScaledObjects
andScaledJobs
(tags can be used to differentiate)The text was updated successfully, but these errors were encountered: