Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1 #4224

Closed
allan-kg opened this issue Feb 10, 2023 · 18 comments · Fixed by kubernetes/kubernetes#115978
Labels
bug Something isn't working

Comments

@allan-kg
Copy link

allan-kg commented Feb 10, 2023

Report

Well, it happens all the time, I can't use KEDA locally for testing.

The error :

E0210 10:57:59.544134  457383 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1

Tried on:

  • Inside a VM
    • minikube
      • metrics-server as addon
      • metrics-server from helm
      • keda from helm
      • keda from apply
    • docker desktop
      • metrics-server from helm
      • keda from helm
      • keda from apply
    • microk8s
      • metrics-server as addon
      • metrics-server from helm
      • keda from helm
      • keda from apply
  • Bare metal
    • docker desktop
      • metrics-server from helm
      • keda from helm
      • keda from apply

Expected Behavior

Proper communication with metric-server and presenting the resources.

Actual Behavior

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[]}

I've made some stress tests and the deployments weren't being scaled.

Steps to Reproduce the Problem

Simple example :

# start any kubectl context : minikube, docker-desktop, microk8s

# enable metrics-server
# or simply run this
helm upgrade --install metrics-server metrics-server/metrics-server -n kube-system --set args={--kubelet-insecure-tls}

# wait and make sure that HPA and "top nodes" are working

# deploy keda
# add its repo from website... then
kubectl create namespace keda
helm install keda kedacore/keda -n keda

After that, most of the times simply running kubectl get all -n keda is enough to raise the error. In a very few attempts I had to try creating a ScaledObject and then it happened immediately. I can't even check if it is a problem with my ScaledObject because the error keeps occurring even after deleting the objects.

Logs from KEDA operator

microk8s kubectl get pods -n keda
E0210 17:12:43.288248 1396573 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
NAME                                      READY   STATUS    RESTARTS   AGE
keda-metrics-apiserver-85f98c6655-bq2zn   1/1     Running   0          21m
keda-operator-97b74b8c8-xjchw             1/1     Running   0          21m


microk8s kubectl logs -n keda keda-operator-97b74b8c8-xjchw -c keda-operator
E0210 17:12:45.963719 1396717 memcache.go:255] couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1
2023-02-10T19:51:26Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2023-02-10T19:51:26Z    INFO    setup   Starting manager
2023-02-10T19:51:26Z    INFO    setup   KEDA Version: 2.9.2
2023-02-10T19:51:26Z    INFO    setup   Git Commit: 9bc3f66578a08cdfe084468ea3ef998fa6bf3bb0
2023-02-10T19:51:26Z    INFO    setup   Go Version: go1.18.8
2023-02-10T19:51:26Z    INFO    setup   Go OS/Arch: linux/amd64
2023-02-10T19:51:26Z    INFO    setup   Running on Kubernetes 1.26      {"version": "v1.26.1"}
I0210 19:51:26.316433       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
2023-02-10T19:51:26Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2023-02-10T19:51:26Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0210 19:51:26.354840       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
2023-02-10T19:51:26Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2023-02-10T19:51:26Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-02-10T19:51:26Z    INFO    Starting Controller     {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2023-02-10T19:51:26Z    INFO    grpc_server     Starting Metrics Service gRPC Server    {"address": ":9666"}
2023-02-10T19:51:26Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2023-02-10T19:51:26Z    INFO    Starting Controller     {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2023-02-10T19:51:26Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2023-02-10T19:51:26Z    INFO    Starting Controller     {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2023-02-10T19:51:26Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2023-02-10T19:51:26Z    INFO    Starting Controller     {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2023-02-10T19:51:26Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-02-10T19:51:26Z    INFO    Starting workers        {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2023-02-10T19:51:26Z    INFO    Starting workers        {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2023-02-10T19:51:26Z    INFO    Starting workers        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}

KEDA Version

2.9.2

Kubernetes Version

1.26

Platform

None

Scaler Details

CPU, Memory

Anything else?

Yes, I have ALREADY seen the workaround comment on another issue telling to "create a dummy resource".

I could be missing something, but as far as I can tell, the resources should be presented to me and if it is mandatory to do something else, it should be very explicit in the docs.

I don't have any idea of what I should to to create this dummy resource and what exactly that means.

@allan-kg allan-kg added the bug Something isn't working label Feb 10, 2023
@JorTurFer
Copy link
Member

Hi,
This is a problem introduced in the tooling by kube-client 0.26.0. We have already opened an issue to the upstream looking for a solution because KEDA (as other custom metrics server) follows the working way suggested but the custom-metrics server repo.

I know it's annoying, but it isn't something we can solve from our side without consolidating the solution with the upstream.
The patch for this problem is to expose at least 1 metric from KEDA metrics server (CPU/Memory Scalers doesn't expose any metric from KEDA's metrics server).

How can you do it? You can create a ScaledObject for one workload, adding a trigger which does nothing, for example a cron trigger with desired replicas 1, this will expose that metric, patching the error you see.
For example, you can do apply this in default namespace

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy-workload
spec:
  replicas: 0
  selector:
    matchLabels:
      app: dummy-workload
  template:
    metadata:
      labels:
        app: dummy-workload
    spec:
      containers:
      - name: dummy
        image: busybox
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dummy-workload
spec:
  scaleTargetRef:
    name: dummy-workload
  minReplicaCount: 0
  maxReplicaCount: 1
  triggers:
  - type: cron
    metadata:
      timezone: Etc/UTC 
      start: 30 * * * *
      end: 45 * * * *
      desiredReplicas: "0"

This will create a dummy workload which will be always scale to 0, but exposing 1 metric and patching the issue

@allan-kg
Copy link
Author

Update : it works, the error does not seem to occur anymore.

Thank you so much @JorTurFer .

@rodrigc
Copy link

rodrigc commented Apr 19, 2023

@JorTurFer I am seeing this problem .

I have:

Client Version: v1.27.1
Server Version: v1.24.10-eks-48e63af

Kubernetes is AWS EKS.

I have this helm chart installed:

NAME	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART      	APP VERSION
keda	keda     	1       	2023-04-14 12:35:28.316303 -0700 PDT	deployed	keda-2.10.2	2.10.1

If I do:

kubectl get apiservices

I see:

v1beta1.external.metrics.k8s.io             keda/keda-operator-metrics-apiserver   False (FailedDiscoveryCheck)   4d21h

Any idea how to fix this?

@JorTurFer
Copy link
Member

#4224 (comment)

@rodrigc
Copy link

rodrigc commented Apr 24, 2023

@JorTurFer Thanks for the pointer to the workaround. Will a more comprehensive fix be introduced?
That workaround is really weird.

@JorTurFer
Copy link
Member

JorTurFer commented Apr 24, 2023

The problem has been fixed in kube-client package (where it was introduced) , the problem is that it hasn't been released.
You can deploy at least one ScaledObject or downgrade kubectl, IDK when the kubectl team will release a new version with the fix.
From our side, we are working with the team of autoscaling sig to avoid this in the long term, but it's something for future metrics server releases.

@rodrigc
Copy link

rodrigc commented Apr 24, 2023

Do you know which is the fix in client-go addresses the problem?

Do you know if this version of client-go in kubectl has the fix:

@JorTurFer
Copy link
Member

This is the PR: kubernetes/kubernetes#115978
I don't have any extra info, sorry
Maybe you can find that info browsing related links from the original issue in the upstream: kubernetes-sigs/custom-metrics-apiserver#146

@rodrigc
Copy link

rodrigc commented Apr 26, 2023

@JorTurFer If I look at kubernetes/kubernetes#115978 , I see that got merged in this commit: kubernetes/kubernetes@6bfa937 and the tags listed on that are:

I'm running kubectl v1.27.1 against a Kubernetes server v1.24.10 (AWS EKS),
and I still see the problem.

For this fix to work properly, do I need kubectl AND the Kubernetes server to be at versions >= 1.27?

@JorTurFer
Copy link
Member

For this fix to work properly, do I need kubectl AND the Kubernetes server to be at versions >= 1.27?

I don't know but I don't think so. Maybe you could ask in k8s repo directly :)

@rodrigc
Copy link

rodrigc commented Apr 26, 2023

OK, I asked here: kubernetes/kubernetes#115978 (comment)

@kycfeel
Copy link

kycfeel commented Apr 27, 2023

Getting the same issue here too

Kubectl and Kubernetes (GKE) are both at v1.26.3.

@JorTurFer
Copy link
Member

This commit, released as part of KEDA v2.11, patches the issue in KEDA side even if the tooling isn't up-to-date

@rodrigc
Copy link

rodrigc commented Aug 13, 2023

@JorTurFer for the commit that you mentioned, can you clarify which tooling is not up-to-date?

@JorTurFer
Copy link
Member

Hi
Do you have any ScaledObject deployed in the cluster?
The root cause of this issue is that kube-client (the underlying golang package to access k8s) had a bug, all k8s tools, such as kubectl, helm, etc were affected. After they fixed the bug in the kubernetes client, the fix needed to be propagated to the tooling.

That's why I said that you tooling could be out-dated. As this issue is related with the k8s tooling and you posted here, I thought that you are affected too. If your issue is different than this because you have at least 1 ScaledObject (with other metrics that CPU or Memory, only with these it could happen too), or if you are using KEDA v2.11 (we added a patch to prevent the bug in the tooling) or if you are using latest versions, probably you are affected by other cause. in that case, could you open an issue to track it and look for the solution? 🙏

@rodrigc
Copy link

rodrigc commented Aug 13, 2023

I removed keda from my cluster because I was getting so many of the same type of error reported in. the original bug report for this issue. So I do not have any ScaledObject right now in my cluster.

It might be useful to compile a table of the common tooling (kubectl, helm) that has the correct fix, similar
to the list I compiled for the k8s server here

The dependencies of this problem span so many systems on the server and client, that I barely understand what is required for a full fix.

@JorTurFer
Copy link
Member

JorTurFer commented Aug 13, 2023

If your problem were related with this issue and KEDA, just deploying KEDA v2.11 it's enough.
As I said, the problem was in the tooling, but together with SIG-autoscaling, we added a mechanism to prevent this scenario even if the tooling has a regression.
That mechanism was merged in KEDA in this commit

It might be useful to compile a table of the common tooling (kubectl, helm) that has the correct fix, similar
to the list I compiled for the k8s server #4224 (comment)

I guess that I'm not getting your question, what fo you mean?

I barely understand what is required for a full fix

I have answered above, just upgrading KEDA to v2.11 it's enough because we have adopted the new approach from the SIG. Thanks to it, it doesn't matter if the tooling is up-to-date or not, because the SIG-Autoscaling have changed the upstream approach to prevent this and other problems in the clients.

You can read more about the problem and the solution here: kubernetes-sigs/custom-metrics-apiserver#150

do you have any other question?

@JorTurFer
Copy link
Member

And just to clarify, all that I said in my previous comment only applies if your problem was the same. If you had other problem than the root cause here, the best option is opening another issue to track it, dig in the problem and fix it.

Just installing KEDA v2.11, we can know if the issue was the same, if it's solved now, the issue was the same 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants