-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty response warning with latest kube-client (v0.26) #146
Comments
This is the approach we are taking everywhere else for custom and external metrics so KEDA is definitely doing things properly. That said I am unsure this is the correct approach from a Kubernetes standpoint. AFAIK we are the only one doing something like that in the Kubernetes ecosystem. Aggregated APIs usually have a known list of Resources they want to expose that rarely changes and it never happens dynamically (e.g PodMetrics/NodeMetrics). But in our case we are exposing metrics as if they were Kubernetes types and we are doing that dynamically meaning that at runtime our API may grow without having to restart the aggregated apiserver. We are essentially doing CRDs the wrong way. In the past I questioned that approach when investigating kubernetes-sigs/prometheus-adapter#292, which is a bug about the infamous spammy log line:
My findings were that at ~34 Resources exposed by the API, this error was starting to appear, but I never got the time to further investigate what were the implication of that, but it definitely didn't sound right. -- To answer the question of whether we could just expose the ExternalMetricValue type instead of the list of all external metrics, the answer is not in the current state of things. We are currently relying on the fact that the metric name is passed in the URL to be able to determine which metric the HPA is trying to get:
I can hardly see a way around that with the way aggregated APIs work today, but there is definitely something wrong with the current approach. I will investigate what can be done to improve the situation, but maybe we could get a first quick win by getting the logs silenced for our APIs or moved to a higher verbosity level. |
Yes, but I'm talking only about using a fixed value for listing operations. I have done this draft based on another contributor PR.
The routing based on the metric name hasn't changed, just the response of |
I guess it makes sense that it works because the HPA controller is most likely not caring at all about discovery. I would expect it to just send a request to the endpoint of the Metric the HPA targets. I wonder if that change would be possible to make here. I have no clue if some users actually rely on the discovery endpoint. I revived a related Kubernetes issue and pinged some people, let's hear what they say maybe we can find a solution that would not be breaking any behavior. |
Sure, I think this topic is complex to solve in only a few minutes, as is part of the discovery. Let's hear other opinions to solve this properly
I don't think so as the endpoint itself doesn't support listing, I mean, the only allowed verb is BTW, thanks for you help 🙇 |
There is a specific |
Do you mean executing I can't see any kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/keda/random-metric-name"
Error from server (NotFound): the server could not find the requested resource
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/keda/pods"
{"kind":"PodMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"keda-admission-5ff995d9b5-t7ss8","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-admission-webhooks","name":"keda-admission-webhooks","pod-template-hash":"5ff995d9b5"}},"timestamp":"2023-01-25T13:19:28Z","window":"1m0.394s","containers":[{"name":"keda-admission-webhooks","usage":{"cpu":"173025n","memory":"16964Ki"}}]},{"metadata":{"name":"keda-metrics-apiserver-8c6db6798-fp7kc","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-metrics-apiserver","pod-template-hash":"8c6db6798"}},"timestamp":"2023-01-25T13:19:29Z","window":"1m3.599s","containers":[{"name":"keda-metrics-apiserver","usage":{"cpu":"3448706n","memory":"43872Ki"}}]},{"metadata":{"name":"keda-operator-545fbd6565-kwxfp","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-operator","name":"keda-operator","pod-template-hash":"545fbd6565"}},"timestamp":"2023-01-25T13:19:34Z","window":"1m12.775s","containers":[{"name":"keda-operator","usage":{"cpu":"2060106n","memory":"54264Ki"}}]}]} This is what I meant with:
The metrics we are registering aren't available on all the namespaces |
/triage accepted |
I think we need to find a solution soon, because the amount of noise will increase with the time, helm is already affected in latest version (because they have bumped their deps) |
I'm seeing this with plain Kubectl too. |
We're also seeing it with any version of |
Independently of the action taken in the upstream, what do you think about exposing a fixed metric based on metric types instead of dynamically calculating the available metrics @dgrisonnet @logicalhan ? This could also solve the timeout problem when there are multiple metrics |
Sorry for being a pain but we will release KEDA v2.10 and we would like to patch this somehow. Currently we have 2 options:
Could you suggest something to us @dgrisonnet @logicalhan ? |
can we keep this ticket open untill the fix is deployed and in working condition ? |
Recent changes in https://github.com/kubernetes/client-go have changed the behavior of discovery client.
These changes are released as part of v0.26 and produced a scenario where the tooling like kubectl or helm shows a warning during the resource discovery:
As a first action, I reviewed if we are doing something wrong in KEDA but after debugging helm and kubectl I found that the reason isn't related with KEDA (prometheus-adapter produces the same error but about custom.metrics) itself.
After the changes, the discovery client list all the api resources and also all the resources inside the api resources, so the scenario where a fresh installation of KEDA/prometheus-adapter without any metric exposed automatically shows this warning on every update tooling, which is annoying (and looks hideous).
In KEDA, we return the current metrics available in response of listing requests, and I have seen that other tools like prometheus-adapter does the same for external metrics and custom metrics
Is this approach the correct? I mean, should we return the metrics instead of the metric type (like metrics.k8s.io does). Exposing all the metrics available as part of listing requests can produce timeouts if the amount of available metrics is huge, and can produce this annoying (and unrelated) warning in case of non metrics for scaling available.
I can see that this approach is also used in the example code, but if this is how we should do the things exposing metrics, the discovery client should consider this to ignore those errors.
Could you provide any help/guidance about this topic. We have received some notifications about this warning in KEDA project and users think it's related with a KEDA problem, but we follow the available example to develop our metrics server.
Is correct from custom-metrics-apiserver usage if we return a fixed collection of generic types (in our case,
scaledobjects
) like metrics.k8s.io api does (with 'pods' and 'nodes'). This change would solve the timeouts in huge clusters and also the issue without any metric. Considering that these metrics aren't browsable like metric-k8s-io are, maybe this could be acceptable from custom-metrics-apiserver implementation povThe text was updated successfully, but these errors were encountered: