Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

Open
ghostx31 opened this issue Apr 17, 2024 · 17 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@ghostx31
Copy link

ghostx31 commented Apr 17, 2024

Report

We manage our deployments using ArgoCD. We upgraded our keda from 2.6 to 2.11.2 recently.

We have this issue on only two specific apps after the upgrade. The exact error message is:

Error from server (Timeout): error when applying patch:
...
[{\"metadata\":{\"metricName\":\"rabbitmq_queue_messages\",\"query\":\"sum(rabbitmq_queue_messages{queue=\\\"<redacted>\\\"}) + sum(rabbitmq_queue_messages{queue=\\\"<redacted>\\\"})\",\"serverAddress\":\"http://prometheus-main.prometheus.svc.cluster.local:9090/\",\"threshold\":\"10\"},\"type\":\"prometheus\"}]}}\n"}},"spec":{"minReplicaCount":1}}
to:
Resource: "[keda.sh/v1alpha1](http://keda.sh/v1alpha1), Resource=scaledobjects", GroupVersionKind: "[keda.sh/v1alpha1](http://keda.sh/v1alpha1), Kind=ScaledObject"
Name: "<redacted>", Namespace: "<redacted>"
for: "objyaml": error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded

This issue occurs both when syncing the Scaled Object from ArgoCD or when applying from kubectl itself. We have 44 scaled objects in this application, out of which ~40 are synced. We get this error when trying to sync for this specific application.

Our environment:
Keda version: v2.11.2
GKE version: 1.25.16-gke.1460000

I found another issue which resembles this but felt we should open a new issue due to the difference in environment: #5487

Expected Behavior

The scaled object should sync without issues.

Actual Behavior

The scaled object does not sync and fails.

Steps to Reproduce the Problem

  1. Upgrade Keda from 2.6 to 2.11.1
  2. Sync the scaled object for an application from ArgoCD or try applying the scaled object from kubectl.

KEDA Version

2.11.2

Kubernetes Version

< 1.26

Platform

Google Cloud

Scaler Details

External scaler - prometheus

@ghostx31 ghostx31 added the bug Something isn't working label Apr 17, 2024
@AleksanderBrzozowski
Copy link

@ghostx31 Have you figured out what can be causing this issue?

We observe a similar behavior, and to be honest it is not clear for me what component throws timeout error.
Is it because Helm applies the change, but the validation webhook doesn't respond quick enough? Or is it something different?

@JorTurFer
Copy link
Member

Hello,
Did you try removing the SO and adding it again? I'm not really sure about the reason behind this, as the timeout is given by the cluster and not by KEDA

@AleksanderBrzozowski
Copy link

@JorTurFer

Removing the SO, and adding it again solves the issue, but it is not convenient to delete and add when we want to make a change 🙁

as the timeout is given by the cluster and not by KEDA

This is the part that I don't understand. I am assuming that the timeout is given by the Kubernetes API Server, but what is the root cause? Is it the keda-admission webhook causing issues? I don't think so, the message would be different in case of webhook failure, something like this:

Internal error occurred: failed calling webhook ...

Any clues? 🙂

@JorTurFer
Copy link
Member

Yeah, it's not a solution at all if you have to delete it all the time. After removing it, do you still not be able to modify it? I mean, you've deleted it and it has worked, so now, can you update it or still not?

@AleksanderBrzozowski
Copy link

Yeah, it's not a solution at all if you have to delete it all the time. After removing it, do you still not be able to modify it? I mean, you've deleted it and it has worked, so now, can you update it or still not?

Even after deleting and adding it again, I am not able to update it. The same error is returned 🙂

@ghostx31
Copy link
Author

ghostx31 commented May 8, 2024

Hello @AleksanderBrzozowski Deleting the SO and then re-syncing it from Argo seems to solve it for us, but this is a bit of hassle and not really a solution since we need to delete and re-sync it every time we need make some change.

@AleksanderBrzozowski
Copy link

@ghostx31 Yeah, so we have the same situation, and trying to find a root cause of this. Any clues what might be causing it? 🙂

@JorTurFer
Copy link
Member

Could you share the ScaledObject that produces conflicts?

@AleksanderBrzozowski
Copy link

Yeah, here it is:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-service
  namespace: my-namespace
spec:
  maxReplicaCount: 60
  minReplicaCount: 2
  pollingInterval: 10
  scaleTargetRef:
    name: my-service
  triggers:
    - metadata:
        metricName: RPS
        query: sum(rate(istio_requests_total{destination_workload_namespace="my-namespace",destination_workload="my-service",
          reporter="destination"}[1m[]))
        serverAddress: http://prometheus-svc.prometheus-ns:9090
        threshold: "500"
      type: prometheus
    - metadata:
        metricName: Latency
        metricType: Value
        query: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{kubernetes_namespace="my-namespace",
          app="my-service", reporter="destination"}[1m[])) by (le))
        serverAddress: http://prometheus-svc.prometheus-ns:9090
        threshold: "50"
      type: prometheus

@JorTurFer
Copy link
Member

Sorry for the delay, I've been quite busy these weeks.

Returning to your case, could you have any issue with the webhooks? Thinking about this, the control plane is calling to all the admission webhooks registered in the clusters (if they have registered the item). KEDA has it's own admission webhook for validating the ScaledObject, do you see any error on it?

You can try disabling the admission webhook temporally just removing the ValidatingWebhookConfiguration. If you remove it, does it work?

@AleksanderBrzozowski
Copy link

@JorTurFer

Sorry for the delay, I've been quite busy these weeks.

No worries 🙂

You can try disabling the admission webhook temporally just removing the ValidatingWebhookConfiguration. If you remove it, does it work?

Yeah, we are aware of the webhook, we should try to disable it to see if it helps. What webhook does under the hood?

@JorTurFer
Copy link
Member

Basically, a few calls to the control plane to get some extra info, like other HPAs and the workload manifest to validate the ScaledObject information (preventing collisions on HPAs, wrong cpu memory config, etc)

Copy link

stale bot commented Jul 26, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jul 26, 2024
Copy link

stale bot commented Aug 3, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Aug 3, 2024
@kallangerard
Copy link

@JorTurFer can we please re-open this one

@JorTurFer JorTurFer reopened this Sep 10, 2024
@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Sep 10, 2024
@fernandogrd
Copy link

I experienced this too, but it was intermittent, basically 1 out of many deploys, it is annoying when it happens.

Copy link

stale bot commented Nov 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

5 participants