Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

ghostx31 · 2024-04-17T07:01:57Z

Report

We manage our deployments using ArgoCD. We upgraded our keda from 2.6 to 2.11.2 recently.

We have this issue on only two specific apps after the upgrade. The exact error message is:

Error from server (Timeout): error when applying patch:
...
[{\"metadata\":{\"metricName\":\"rabbitmq_queue_messages\",\"query\":\"sum(rabbitmq_queue_messages{queue=\\\"<redacted>\\\"}) + sum(rabbitmq_queue_messages{queue=\\\"<redacted>\\\"})\",\"serverAddress\":\"http://prometheus-main.prometheus.svc.cluster.local:9090/\",\"threshold\":\"10\"},\"type\":\"prometheus\"}]}}\n"}},"spec":{"minReplicaCount":1}}
to:
Resource: "[keda.sh/v1alpha1](http://keda.sh/v1alpha1), Resource=scaledobjects", GroupVersionKind: "[keda.sh/v1alpha1](http://keda.sh/v1alpha1), Kind=ScaledObject"
Name: "<redacted>", Namespace: "<redacted>"
for: "objyaml": error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded

This issue occurs both when syncing the Scaled Object from ArgoCD or when applying from kubectl itself. We have 44 scaled objects in this application, out of which ~40 are synced. We get this error when trying to sync for this specific application.

Our environment:
Keda version: v2.11.2
GKE version: 1.25.16-gke.1460000

I found another issue which resembles this but felt we should open a new issue due to the difference in environment: #5487

Expected Behavior

The scaled object should sync without issues.

Actual Behavior

The scaled object does not sync and fails.

Steps to Reproduce the Problem

Upgrade Keda from 2.6 to 2.11.1
Sync the scaled object for an application from ArgoCD or try applying the scaled object from kubectl.

KEDA Version

2.11.2

Kubernetes Version

< 1.26

Platform

Google Cloud

Scaler Details

External scaler - prometheus

The text was updated successfully, but these errors were encountered:

AleksanderBrzozowski · 2024-05-07T14:47:55Z

@ghostx31 Have you figured out what can be causing this issue?

We observe a similar behavior, and to be honest it is not clear for me what component throws timeout error.
Is it because Helm applies the change, but the validation webhook doesn't respond quick enough? Or is it something different?

JorTurFer · 2024-05-07T20:47:07Z

Hello,
Did you try removing the SO and adding it again? I'm not really sure about the reason behind this, as the timeout is given by the cluster and not by KEDA

AleksanderBrzozowski · 2024-05-08T06:58:02Z

@JorTurFer

Removing the SO, and adding it again solves the issue, but it is not convenient to delete and add when we want to make a change 🙁

as the timeout is given by the cluster and not by KEDA

This is the part that I don't understand. I am assuming that the timeout is given by the Kubernetes API Server, but what is the root cause? Is it the keda-admission webhook causing issues? I don't think so, the message would be different in case of webhook failure, something like this:

Internal error occurred: failed calling webhook ...

Any clues? 🙂

JorTurFer · 2024-05-08T07:04:10Z

Yeah, it's not a solution at all if you have to delete it all the time. After removing it, do you still not be able to modify it? I mean, you've deleted it and it has worked, so now, can you update it or still not?

AleksanderBrzozowski · 2024-05-08T07:07:45Z

Yeah, it's not a solution at all if you have to delete it all the time. After removing it, do you still not be able to modify it? I mean, you've deleted it and it has worked, so now, can you update it or still not?

Even after deleting and adding it again, I am not able to update it. The same error is returned 🙂

ghostx31 · 2024-05-08T07:08:02Z

Hello @AleksanderBrzozowski Deleting the SO and then re-syncing it from Argo seems to solve it for us, but this is a bit of hassle and not really a solution since we need to delete and re-sync it every time we need make some change.

AleksanderBrzozowski · 2024-05-08T07:09:40Z

@ghostx31 Yeah, so we have the same situation, and trying to find a root cause of this. Any clues what might be causing it? 🙂

JorTurFer · 2024-05-08T07:12:33Z

Could you share the ScaledObject that produces conflicts?

AleksanderBrzozowski · 2024-05-08T07:49:01Z

Yeah, here it is:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-service
  namespace: my-namespace
spec:
  maxReplicaCount: 60
  minReplicaCount: 2
  pollingInterval: 10
  scaleTargetRef:
    name: my-service
  triggers:
    - metadata:
        metricName: RPS
        query: sum(rate(istio_requests_total{destination_workload_namespace="my-namespace",destination_workload="my-service",
          reporter="destination"}[1m[]))
        serverAddress: http://prometheus-svc.prometheus-ns:9090
        threshold: "500"
      type: prometheus
    - metadata:
        metricName: Latency
        metricType: Value
        query: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{kubernetes_namespace="my-namespace",
          app="my-service", reporter="destination"}[1m[])) by (le))
        serverAddress: http://prometheus-svc.prometheus-ns:9090
        threshold: "50"
      type: prometheus

JorTurFer · 2024-05-26T14:23:31Z

Sorry for the delay, I've been quite busy these weeks.

Returning to your case, could you have any issue with the webhooks? Thinking about this, the control plane is calling to all the admission webhooks registered in the clusters (if they have registered the item). KEDA has it's own admission webhook for validating the ScaledObject, do you see any error on it?

You can try disabling the admission webhook temporally just removing the ValidatingWebhookConfiguration. If you remove it, does it work?

AleksanderBrzozowski · 2024-05-27T07:04:44Z

@JorTurFer

Sorry for the delay, I've been quite busy these weeks.

No worries 🙂

You can try disabling the admission webhook temporally just removing the ValidatingWebhookConfiguration. If you remove it, does it work?

Yeah, we are aware of the webhook, we should try to disable it to see if it helps. What webhook does under the hood?

JorTurFer · 2024-05-27T07:08:20Z

Basically, a few calls to the control plane to get some extra info, like other HPAs and the workload manifest to validate the ScaledObject information (preventing collisions on HPAs, wrong cpu memory config, etc)

stale · 2024-07-26T23:44:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2024-08-03T09:31:45Z

This issue has been automatically closed due to inactivity.

kallangerard · 2024-09-10T03:59:34Z

@JorTurFer can we please re-open this one

fernandogrd · 2024-09-15T16:08:43Z

I experienced this too, but it was intermittent, basically 1 out of many deploys, it is annoying when it happens.

stale · 2024-11-17T06:39:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

ghostx31 added the bug Something isn't working label Apr 17, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Jul 26, 2024

stale bot closed this as completed Aug 3, 2024

JorTurFer reopened this Sep 10, 2024

stale bot removed the stale All issues that are marked as stale due to inactivity label Sep 10, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

ghostx31 commented Apr 17, 2024 •

edited

Loading

AleksanderBrzozowski commented May 7, 2024

JorTurFer commented May 7, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

ghostx31 commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 26, 2024

AleksanderBrzozowski commented May 27, 2024

JorTurFer commented May 27, 2024

stale bot commented Jul 26, 2024

stale bot commented Aug 3, 2024

kallangerard commented Sep 10, 2024

fernandogrd commented Sep 15, 2024

stale bot commented Nov 17, 2024

Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

Getting the error: error when patching "obj.yaml": Timeout: request did not complete within requested timeout - context deadline exceeded #5700

Comments

ghostx31 commented Apr 17, 2024 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

KEDA Version

Kubernetes Version

Platform

Scaler Details

AleksanderBrzozowski commented May 7, 2024

JorTurFer commented May 7, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

ghostx31 commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 8, 2024

AleksanderBrzozowski commented May 8, 2024

JorTurFer commented May 26, 2024

AleksanderBrzozowski commented May 27, 2024

JorTurFer commented May 27, 2024

stale bot commented Jul 26, 2024

stale bot commented Aug 3, 2024

kallangerard commented Sep 10, 2024

fernandogrd commented Sep 15, 2024

stale bot commented Nov 17, 2024

ghostx31 commented Apr 17, 2024 •

edited

Loading