Empty response warning with latest kube-client (v0.26) #146

JorTurFer · 2023-01-24T15:25:01Z

Recent changes in https://github.com/kubernetes/client-go have changed the behavior of discovery client.
These changes are released as part of v0.26 and produced a scenario where the tooling like kubectl or helm shows a warning during the resource discovery:

E1212 15:54:39.565250   21101 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1

As a first action, I reviewed if we are doing something wrong in KEDA but after debugging helm and kubectl I found that the reason isn't related with KEDA (prometheus-adapter produces the same error but about custom.metrics) itself.
After the changes, the discovery client list all the api resources and also all the resources inside the api resources, so the scenario where a fresh installation of KEDA/prometheus-adapter without any metric exposed automatically shows this warning on every update tooling, which is annoying (and looks hideous).

In KEDA, we return the current metrics available in response of listing requests, and I have seen that other tools like prometheus-adapter does the same for external metrics and custom metrics

Is this approach the correct? I mean, should we return the metrics instead of the metric type (like metrics.k8s.io does). Exposing all the metrics available as part of listing requests can produce timeouts if the amount of available metrics is huge, and can produce this annoying (and unrelated) warning in case of non metrics for scaling available.
I can see that this approach is also used in the example code, but if this is how we should do the things exposing metrics, the discovery client should consider this to ignore those errors.

Could you provide any help/guidance about this topic. We have received some notifications about this warning in KEDA project and users think it's related with a KEDA problem, but we follow the available example to develop our metrics server.

Is correct from custom-metrics-apiserver usage if we return a fixed collection of generic types (in our case, scaledobjects) like metrics.k8s.io api does (with 'pods' and 'nodes'). This change would solve the timeouts in huge clusters and also the issue without any metric. Considering that these metrics aren't browsable like metric-k8s-io are, maybe this could be acceptable from custom-metrics-apiserver implementation pov

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-01-24T16:39:33Z

@olivierlemasle @dgrisonnet FYI

jbilliau-rcd · 2023-01-24T17:48:19Z

Agreed. We run Keda 2.7.1 and prometheus-adapter on our clusters and this is what a Helm upgrade command looks like while it's running:

Developers freak out because they think something is hosed....not a very good user experience.

dgrisonnet · 2023-01-24T18:19:40Z

Is this approach the correct? I mean, should we return the metrics instead of the metric type (like metrics.k8s.io does). Exposing all the metrics available as part of listing requests can produce timeouts if the amount of available metrics is huge, and can produce this annoying (and unrelated) warning in case of non metrics for scaling available.

This is the approach we are taking everywhere else for custom and external metrics so KEDA is definitely doing things properly.

That said I am unsure this is the correct approach from a Kubernetes standpoint. AFAIK we are the only one doing something like that in the Kubernetes ecosystem. Aggregated APIs usually have a known list of Resources they want to expose that rarely changes and it never happens dynamically (e.g PodMetrics/NodeMetrics). But in our case we are exposing metrics as if they were Kubernetes types and we are doing that dynamically meaning that at runtime our API may grow without having to restart the aggregated apiserver. We are essentially doing CRDs the wrong way.

In the past I questioned that approach when investigating kubernetes-sigs/prometheus-adapter#292, which is a bug about the infamous spammy log line:

apiserver was unable to write a JSON response: http2: stream closed

My findings were that at ~34 Resources exposed by the API, this error was starting to appear, but I never got the time to further investigate what were the implication of that, but it definitely didn't sound right.

--

To answer the question of whether we could just expose the ExternalMetricValue type instead of the list of all external metrics, the answer is not in the current state of things. We are currently relying on the fact that the metric name is passed in the URL to be able to determine which metric the HPA is trying to get:

custom-metrics-apiserver/pkg/registry/external_metrics/reststorage.go

Line 85 in 63817c8

metricName := requestInfo.Resource

I can hardly see a way around that with the way aggregated APIs work today, but there is definitely something wrong with the current approach.

I will investigate what can be done to improve the situation, but maybe we could get a first quick win by getting the logs silenced for our APIs or moved to a higher verbosity level.

JorTurFer · 2023-01-24T19:19:50Z

Yes, but I'm talking only about using a fixed value for listing operations. I have done this draft based on another contributor PR.
If you see the changes:

basically, we return a fixed slice with always the same content. This solves the timeout issue and also the 0 length issue, and it works, at least in KEDA (I have tested the change with e2e tests)

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"scaledobjects","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}

The routing based on the metric name hasn't changed, just the response of ListAllExternalMetrics

dgrisonnet · 2023-01-24T19:34:05Z

I guess it makes sense that it works because the HPA controller is most likely not caring at all about discovery. I would expect it to just send a request to the endpoint of the Metric the HPA targets.

I wonder if that change would be possible to make here. I have no clue if some users actually rely on the discovery endpoint.

I revived a related Kubernetes issue and pinged some people, let's hear what they say maybe we can find a solution that would not be breaking any behavior.

JorTurFer · 2023-01-24T19:40:43Z

I revived a related Kubernetes issue and pinged some people, let's hear what they say maybe we can find a solution that would not be breaking any behavior.

Sure, I think this topic is complex to solve in only a few minutes, as is part of the discovery. Let's hear other opinions to solve this properly

I have no clue if some users actually rely on the discovery endpoint.

I don't think so as the endpoint itself doesn't support listing, I mean, the only allowed verb is get. AFAIK, there isn't any way to introspect the resources just using the namespaces, as the list verb isn't allowed, just get.

BTW, thanks for you help 🙇

dgrisonnet · 2023-01-25T08:57:31Z

There is a specific /apis endpoint for discovery that is available to any clients with the right certificates.

JorTurFer · 2023-01-25T13:21:21Z

Do you mean executing kubectl get --raw "/apis/external.metrics.k8s.io"?

I can't see any /apis inside external.metrics.k8s.io. I meant that even if you discover external.metrics.k8s.io from /apis, the response of /apis/external.metrics.k8s.io/v1beta1 is a list of available metrics with the supported verbs, and the only supported verb is get, so the discovery has ended here, as you can't do kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespace/xxxxxx" to list the metrics available in the namespace, like you can do with pods/nodes, for example in an arbitrary namespace:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/keda/random-metric-name"     
Error from server (NotFound): the server could not find the requested resource

kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/keda/pods"     
{"kind":"PodMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"keda-admission-5ff995d9b5-t7ss8","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-admission-webhooks","name":"keda-admission-webhooks","pod-template-hash":"5ff995d9b5"}},"timestamp":"2023-01-25T13:19:28Z","window":"1m0.394s","containers":[{"name":"keda-admission-webhooks","usage":{"cpu":"173025n","memory":"16964Ki"}}]},{"metadata":{"name":"keda-metrics-apiserver-8c6db6798-fp7kc","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-metrics-apiserver","pod-template-hash":"8c6db6798"}},"timestamp":"2023-01-25T13:19:29Z","window":"1m3.599s","containers":[{"name":"keda-metrics-apiserver","usage":{"cpu":"3448706n","memory":"43872Ki"}}]},{"metadata":{"name":"keda-operator-545fbd6565-kwxfp","namespace":"keda","creationTimestamp":"2023-01-25T13:20:10Z","labels":{"app":"keda-operator","name":"keda-operator","pod-template-hash":"545fbd6565"}},"timestamp":"2023-01-25T13:19:34Z","window":"1m12.775s","containers":[{"name":"keda-operator","usage":{"cpu":"2060106n","memory":"54264Ki"}}]}]}

This is what I meant with:

I don't think so as the endpoint itself doesn't support listing, I mean, the only allowed verb is get. AFAIK, there isn't any way to introspect the resources just using the namespaces, as the list verb isn't allowed, just get.

The metrics we are registering aren't available on all the namespaces

logicalhan · 2023-01-26T17:55:46Z

/triage accepted
/assign @dgrisonnet

felixrb86 · 2023-02-10T15:38:19Z

The same thing happens to me, just running a simple command...

JorTurFer · 2023-02-11T12:50:27Z

I think we need to find a solution soon, because the amount of noise will increase with the time, helm is already affected in latest version (because they have bumped their deps)

secustor · 2023-02-11T13:32:32Z

I think we need to find a solution soon, because the amount of noise will increase with the time, helm/helm#11772 affected in latest version (because they have bumped their deps)

I'm seeing this with plain Kubectl too.

tonedefdev · 2023-02-17T23:07:51Z

I think we need to find a solution soon, because the amount of noise will increase with the time, helm/helm#11772 affected in latest version (because they have bumped their deps)

I'm seeing this with plain Kubectl too.

We're also seeing it with any version of kubectl greater than v1.24 in clusters where keda is installed

JorTurFer · 2023-02-28T09:49:39Z

Independently of the action taken in the upstream, what do you think about exposing a fixed metric based on metric types instead of dynamically calculating the available metrics @dgrisonnet @logicalhan ?

This could also solve the timeout problem when there are multiple metrics

JorTurFer · 2023-03-06T14:15:34Z

Sorry for being a pain but we will release KEDA v2.10 and we would like to patch this somehow. Currently we have 2 options:

Patch the problem exposing a dummy metric if there isn't any other metric
Apply the change I suggested before and always expose a fixed metric

Could you suggest something to us @dgrisonnet @logicalhan ?

wamak9 · 2023-03-06T17:49:54Z

can we keep this ticket open untill the fix is deployed and in working condition ?

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 24, 2023

dgrisonnet mentioned this issue Jan 24, 2023

Discovery memCacheClient has inconsistent error behavior with other DiscoveryInterfaces kubernetes/kubernetes#111292

Closed

This was referenced Jan 25, 2023

uninstall prometheus-adapter kubernetes-sigs/prometheus-adapter#557

Closed

What we should return for ListAllExternalMetrics? All trigger names, or all trigger type? #114

Closed

k8s-ci-robot assigned dgrisonnet Jan 26, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 26, 2023

philomory mentioned this issue Feb 2, 2023

Under certain circumstances, the deletion of a Vizier does not lead to deletion of that Vizier's resources pixie-io/pixie#724

Open

sandy-wang mentioned this issue Feb 7, 2023

Couldn't get resource list for metrics error helm/helm#11772

Closed

pkosiec mentioned this issue Feb 8, 2023

Got empty response for: external.metrics.k8s.io/v1beta1 kubeshop/botkube#829

Closed

This was referenced Feb 11, 2023

couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1 kedacore/keda#4224

Closed

"couldn't get resource list" error kubernetes/client-go#1223

Closed

seans3 mentioned this issue Feb 24, 2023

"empty response" not logged as error in memcache discovery client kubernetes/kubernetes#115978

Merged

JorTurFer mentioned this issue Feb 28, 2023

Fix/ListAllExternalMetrics change to return api-resource types kedacore/keda#3825

Closed

htroisi mentioned this issue Mar 3, 2023

Bump chart and app version. newrelic/newrelic-k8s-metrics-adapter#85

Closed

oprudkyi mentioned this issue Mar 4, 2023

custom-metrics-stackdriver-adapter - couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1 GoogleCloudPlatform/k8s-stackdriver#523

Closed

htroisi mentioned this issue Mar 6, 2023

Bump chart and app version newrelic/newrelic-k8s-metrics-adapter#100

Merged

k8s-ci-robot closed this as completed in kubernetes/kubernetes#115978 Mar 6, 2023

JorTurFer mentioned this issue Mar 6, 2023

Should we rethink the response of listing requests? #150

Closed

JorTurFer mentioned this issue Aug 13, 2023

Popeye Error Got empty response for: external.metrics.k8s.io/v1beta1 derailed/popeye#249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty response warning with latest kube-client (v0.26) #146

Empty response warning with latest kube-client (v0.26) #146

JorTurFer commented Jan 24, 2023 •

edited

Loading

JorTurFer commented Jan 24, 2023

jbilliau-rcd commented Jan 24, 2023

dgrisonnet commented Jan 24, 2023

JorTurFer commented Jan 24, 2023 •

edited

Loading

dgrisonnet commented Jan 24, 2023 •

edited

Loading

JorTurFer commented Jan 24, 2023

dgrisonnet commented Jan 25, 2023

JorTurFer commented Jan 25, 2023 •

edited

Loading

logicalhan commented Jan 26, 2023

felixrb86 commented Feb 10, 2023

JorTurFer commented Feb 11, 2023

secustor commented Feb 11, 2023

tonedefdev commented Feb 17, 2023

JorTurFer commented Feb 28, 2023

JorTurFer commented Mar 6, 2023

wamak9 commented Mar 6, 2023

Empty response warning with latest kube-client (v0.26) #146

Empty response warning with latest kube-client (v0.26) #146

Comments

JorTurFer commented Jan 24, 2023 • edited Loading

JorTurFer commented Jan 24, 2023

jbilliau-rcd commented Jan 24, 2023

dgrisonnet commented Jan 24, 2023

JorTurFer commented Jan 24, 2023 • edited Loading

dgrisonnet commented Jan 24, 2023 • edited Loading

JorTurFer commented Jan 24, 2023

dgrisonnet commented Jan 25, 2023

JorTurFer commented Jan 25, 2023 • edited Loading

logicalhan commented Jan 26, 2023

felixrb86 commented Feb 10, 2023

JorTurFer commented Feb 11, 2023

secustor commented Feb 11, 2023

tonedefdev commented Feb 17, 2023

JorTurFer commented Feb 28, 2023

JorTurFer commented Mar 6, 2023

wamak9 commented Mar 6, 2023

JorTurFer commented Jan 24, 2023 •

edited

Loading

JorTurFer commented Jan 24, 2023 •

edited

Loading

dgrisonnet commented Jan 24, 2023 •

edited

Loading

JorTurFer commented Jan 25, 2023 •

edited

Loading