Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constants crashes in keda operator after deploying service controlled by scaledobject #4389

Closed
martinmr opened this issue Mar 17, 2023 · 34 comments · Fixed by #4722
Closed
Assignees
Labels
bug Something isn't working

Comments

@martinmr
Copy link

martinmr commented Mar 17, 2023

Report

Keda controller is constantly crashing after I deploy a new version of the service targeted by the scaled object.

It tends to work for a while but after deploying the service, no metrics can be queried. The Keda controller logs all spit a bunch or errors, but all of them are related to the GetMetricsfunction.

Expected Behavior

No crashes

Actual Behavior

Constant crashes in the keda controller

Steps to Reproduce the Problem

  • Deploy scaled object. I can query the metrics.
  • Redeploy the service that is the controlled by the scaled object.
  • KEDA starts crashing. There are multiple bugs according to the logs, but all of them revolve around the GetMetrics function. Examples are "assignment to nil map", "out of index", "concurrent write to map". All of them are bugs in KEDA code so I don't think it's an issue with the scaled object config.

Logs from KEDA operator

2023-03-17T21:28:05Z    ERROR   scalehandler    Failed to patch ScaledObjects Status    {"error": "resourc>
github.com/kedacore/keda/v2/pkg/fallback.updateStatus
        /workspace/pkg/fallback/fallback.go:126
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback
        /workspace/pkg/fallback/fallback.go:58
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
        /workspace/pkg/scaling/scale_handler.go:446
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
        /workspace/pkg/metricsservice/server.go:45
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
        /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79
google.golang.org/grpc.(*Server).processUnaryRPC
        /workspace/vendor/google.golang.org/grpc/server.go:1340
google.golang.org/grpc.(*Server).handleStream
        /workspace/vendor/google.golang.org/grpc/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
        /workspace/vendor/google.golang.org/grpc/server.go:965
panic: runtime error: index out of range [29] with length 29

KEDA Version

2.9.2

Kubernetes Version

1.23

Platform

Amazon Web Services

Scaler Details

Datadog

Anything else?

The only thing weird about this scaler is that it has around 40 triggers. We are using this service to have a single interface to query the metrics provided by KEDA. I set the min/max replicas to 2. I even disabled autoscaling with 2 replicas, but that didn't help. But I don't think the scaledobject config is the issue because we can query the metrics for a little while.

Destroying keda and redeploying seemed to work for a while but it always breaks down around the time the service is deployed.

@martinmr martinmr added the bug Something isn't working label Mar 17, 2023
@martinmr
Copy link
Author

@reynoldsme I opened an issue with KEDA

@martinmr
Copy link
Author

Another such panic

2023-03-17T22:16:42Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"historical-throttler","namespace":"platform-file-sage"}, "namespace": "platform-file-sage", "name": "historical-throttler", "reconcil>
fatal error: concurrent map writes

goroutine 1475 [running]:
runtime.throw({0x395dd78?, 0xc008eb39c8?})
        /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc008eb3970 sp=0xc008eb3940 pc=0x4384f1
runtime.mapassign_faststr(0x0?, 0x30f04e0?, {0xc0098e6740, 0x19})
        /usr/local/go/src/runtime/map_faststr.go:212 +0x39c fp=0xc008eb39d8 sp=0xc008eb3970 pc=0x4133dc
reflect.mapassign_faststr(0x32a97a0, 0xc0098f9790?, {0xc0098e6740?, 0x396be6d?}, 0x19?)
        /usr/local/go/src/runtime/map.go:1357 +0x28 fp=0xc008eb3a10 sp=0xc008eb39d8 pc=0x4627a8
reflect.Value.SetMapIndex({0x32a97a0?, 0xc0009876b0?, 0x6b92?}, {0x30f04e0, 0xc0098f97a0, 0x98}, {0x30f04e0, 0xc0098f9790, 0x198})
        /usr/local/go/src/reflect/value.go:2232 +0x225 fp=0xc008eb3a98 sp=0xc008eb3a10 pc=0x49c6c5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0096e6dd0, {0x32a97a0?, 0xc0009876b0?, 0x6b9f?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:908 +0x1997 fp=0xc008eb3dc8 sp=0xc008eb3a98 pc=0x602037
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0096e6dd0, {0x32a97a0?, 0xc0009876b0?, 0x6?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc008eb3e38 sp=0xc008eb3dc8 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0096e6dd0, {0x38136a0?, 0xc000987620?, 0x9fd8?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:866 +0x1267 fp=0xc008eb4168 sp=0xc008eb3e38 pc=0x601907
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0096e6dd0, {0x38136a0?, 0xc000987620?, 0x8?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc008eb41d8 sp=0xc008eb4168 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0096e6dd0, {0x3873b80?, 0xc000987600?, 0xc008eb4548?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:866 +0x1267 fp=0xc008eb4508 sp=0xc008eb41d8 pc=0x601907
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0096e6dd0, {0x3873b80?, 0xc000987600?, 0xc008eb45b8?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc008eb4578 sp=0xc008eb4508 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).unmarshal(0xc0096e6dd0, {0x3873b80?, 0xc000987600?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:187 +0x1de fp=0xc008eb45f0 sp=0xc008eb4578 pc=0x5fee5e
sigs.k8s.io/json/internal/golang/encoding/json.Unmarshal({0xc009a20000, 0x8b84, 0xa000}, {0x3873b80, 0xc000987600}, {0xc008eb46e0, 0x2, 0x7?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:112 +0x159 fp=0xc008eb4630 sp=0xc008eb45f0 pc=0x5febf9
sigs.k8s.io/json.UnmarshalCaseSensitivePreserveInts(...)
        /workspace/vendor/sigs.k8s.io/json/json.go:62
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).unmarshal(0xc0099027d0?, {0x40a52f0?, 0xc000987600?}, {0xc009a20000?, 0xc0099027e0?, 0xc?}, {0xc009a20000?, 0xc000375340?, 0x409ca00?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:258 +0x3a5 fp=0xc008eb4700 sp=0xc008eb4630 pc=0xadd465
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Decode(0xc000e2e140, {0xc009a20000, 0x8b84, 0xa000}, 0x0, {0x40a52f0, 0xc000987600?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:206 +0xa6a fp=0xc008eb49f8 sp=0xc008eb4700 pc=0xadcaea
k8s.io/apimachinery/pkg/runtime.WithoutVersionDecoder.Decode({{0x409cae0?, 0xc000e2e140?}}, {0xc009a20000?, 0x199?, 0xc000987600?}, 0x3596b80?, {0x40a52f0?, 0xc000987600?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/helper.go:252 +0x55 fp=0xc008eb4a68 sp=0xc008eb49f8 pc=0x89f655
k8s.io/apimachinery/pkg/runtime.(*WithoutVersionDecoder).Decode(0x3873b80?, {0xc009a20000?, 0x0?, 0x0?}, 0x0?, {0x40a52f0?, 0xc000987600?})
        <autogenerated>:1 +0x69 fp=0xc008eb4ab8 sp=0xc008eb4a68 pc=0x8acd69
sigs.k8s.io/controller-runtime/pkg/client/apiutil.targetZeroingDecoder.Decode({{0x40a0940?, 0xc0098f9700?}}, {0xc009a20000, 0x8b84, 0xa000}, 0x0?, {0x40a52f0?, 0xc000987600?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/apiutil/apimachinery.go:186 +0xb8 fp=0xc008eb4b28 sp=0xc008eb4ab8 pc=0x14ef958
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*targetZeroingDecoder).Decode(0xc0098b9200?, {0xc009a20000?, 0xc0088e9560?, 0x392f2e4?}, 0xc008eb4c10?, {0x40a52f0?, 0xc000987600?})
        <autogenerated>:1 +0x69 fp=0xc008eb4b78 sp=0xc008eb4b28 pc=0x14f29e9
k8s.io/client-go/rest.Result.Into({{0xc009a20000, 0x8b84, 0xa000}, {0x0, 0x0, 0x0}, {0xc0095af100, 0x10}, {0x0, 0x0}, ...}, ...)
        /workspace/vendor/k8s.io/client-go/rest/request.go:1307 +0xad fp=0xc008eb4c48 sp=0xc008eb4b78 pc=0x115146d
sigs.k8s.io/controller-runtime/pkg/client.(*typedClient).PatchStatus(0xc0006ae690, {0x40bfa10, 0xc0088e9560}, {0x40db8b0?, 0xc000987600?}, {0x40acaa0, 0xc009874bd0}, {0x0, 0x0, 0x0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/typed_client.go:207 +0x4c5 fp=0xc008eb4dc8 sp=0xc008eb4c48 pc=0x1514f45
sigs.k8s.io/controller-runtime/pkg/client.(*statusWriter).Patch(0xc008becf28, {0x40bfa10, 0xc0088e9560}, {0x40db8b0?, 0xc000987600?}, {0x40acaa0, 0xc009874bd0}, {0x0, 0x0, 0x0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:325 +0x38f fp=0xc008eb4f18 sp=0xc008eb4dc8 pc=0x150be6f
github.com/kedacore/keda/v2/pkg/fallback.updateStatus({0x40bfa10, 0xc0088e9560}, {0x40cd790, 0xc000e2e320}, {{0x40c6200?, 0xc000e2df50?}, 0x6077f20?}, 0xc000987600, 0xc008eb5268, {{0x39380f4, ...}, ...})
        /workspace/pkg/fallback/fallback.go:124 +0x5a5 fp=0xc008eb5188 sp=0xc008eb4f18 pc=0x16c02c5
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback({0x40bfa10, 0xc0088e9560}, {0x40cd790, 0xc000e2e320}, {{0x40c6200?, 0xc000e2df50?}, 0x23?}, {0xc0096f3dd0, 0x1, 0x1}, ...)
        /workspace/pkg/fallback/fallback.go:58 +0x61e fp=0xc008eb5308 sp=0xc008eb5188 pc=0x16bf7be
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics(0xc000e4ec60, {0x40bfa10, 0xc0088e9560}, {0xc0095eec30, 0x14}, {0xc0095eec48, 0x12}, {0xc008764ea0, 0x23})
        /workspace/pkg/scaling/scale_handler.go:446 +0x12c5 fp=0xc008eb5918 sp=0xc008eb5308 pc=0x2dd9205
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics(0xc000e58810, {0x40bfa10, 0xc0088e9560}, 0xc008929200)
        /workspace/pkg/metricsservice/server.go:45 +0xbc fp=0xc008eb59d0 sp=0xc008eb5918 pc=0x2dedabc
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler({0x348b5e0?, 0xc000e58810}, {0x40bfa10, 0xc0088e9560}, 0xc0008811f0, 0x0)
        /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79 +0x170 fp=0xc008eb5a28 sp=0xc008eb59d0 pc=0x1785750
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000f01e0, {0x40cb6d0, 0xc009349380}, 0xc0095eaa20, 0xc000e588a0, 0x6033620, 0x0)
        /workspace/vendor/google.golang.org/grpc/server.go:1340 +0xd13 fp=0xc008eb5e48 sp=0xc008eb5a28 pc=0x17706f3
google.golang.org/grpc.(*Server).handleStream(0xc0000f01e0, {0x40cb6d0, 0xc009349380}, 0xc0095eaa20, 0x0)
        /workspace/vendor/google.golang.org/grpc/server.go:1713 +0xa1b fp=0xc008eb5f68 sp=0xc008eb5e48 pc=0x17756fb
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /workspace/vendor/google.golang.org/grpc/server.go:965 +0x98 fp=0xc008eb5fe0 sp=0xc008eb5f68 pc=0x176e198
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc008eb5fe8 sp=0xc008eb5fe0 pc=0x468ee1
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /workspace/vendor/google.golang.org/grpc/server.go:963 +0x28a

@zroubalik
Copy link
Member

Hi, could you please clarify what do you mean by:

Deploy the service that is the controlled by the scaled object.

What service? The workload that is targeted by scaleTargetRef?

Could you paste here example of ScaledObject?

@martinmr
Copy link
Author

Yes, that's the service I mean.

Here's an example scaled object. Our use case for now is that the service targeted needs to know a lot of metrics about our cluster in order to drive traffic into a file processing pipeline only when traffic and usage are low, so we are trying to use KEDA to have a common interface to query the metrics with the external metrics client. The service was working fine when KEDA was not crashing. We were also able to query the metrics via kubectl. After the crashes, both kubectl and the service timeout when trying to query the metrics. Which makes sense given that the operator is in a constant crashloop.

Not posting the actual scaled object due to being work stuff, but I've tried to match the options as closely as I can. I also tried to add the annotation to pause the autoscaling with 2 replicas, but that didn't have any effect.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: datadog-scaledobject
  namespace: my-scaled-object
spec:
  scaleTargetRef:
    name: my-kubernetes-service
  minReplicaCount:  2
  maxReplicaCount:  2
  triggers:
  # There are actually about 40 triggers but they are similar so I am only adding one.
  - type: datadog
    metricType: "Value"
    metadata:
      query: "avg:some_metric{env:dev,topic:my-topic}"
      # Query value is set really high to avoid any scaling.
      queryValue: "100000000"
      age: "120"
      metricUnavailableValue: "0"
    authenticationRef:
      name: keda-trigger-auth-datadog-secret

There's nothing special about the object other than the amount of triggers and the high query values meant to disable the scaling. But the min and max replicas are the same, so that shouldn't matter anyway. And all the metrics were working before the crashes in the KEDA controller.

@zroubalik
Copy link
Member

zroubalik commented Mar 19, 2023

So you first deploy the ScaledObject and then the service?

Also the my-kubernetes-service is a kubernetes Deployment right?

Could you please do this:

  1. deploy the service first
  2. deploy ScaledObject
  3. then collect logs from KEDA operator until there is first crash

Thanks

@reynoldsme
Copy link

reynoldsme commented Mar 23, 2023

I work with @martinmr Here is an example crash (the names of some namespaces and ScaledObjects have been replaced) after deploying the workload, ScaledObject, then recreating the keda-operator pod via kubctl delete:

$ k logs -f keda-operator-7b5f8c4c44-qmhcz -n keda
2023-03-23T14:31:40Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2023-03-23T14:31:40Z    INFO    setup   Starting manager
2023-03-23T14:31:40Z    INFO    setup   KEDA Version: 2.9.2
2023-03-23T14:31:40Z    INFO    setup   Git Commit: 9bc3f66578a08cdfe084468ea3ef998fa6bf3bb0
2023-03-23T14:31:40Z    INFO    setup   Go Version: go1.18.8
2023-03-23T14:31:40Z    INFO    setup   Go OS/Arch: linux/amd64
2023-03-23T14:31:40Z    INFO    setup   Running on Kubernetes 1.24+     {"version": "v1.24.10-eks-48e63af"}
I0323 14:31:40.348184       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
2023-03-23T14:31:40Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2023-03-23T14:31:40Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0323 14:31:56.116193       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2023-03-23T14:31:56Z    INFO    grpc_server     Starting Metrics Service gRPC Server    {"address": ":9666"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2023-03-23T14:31:56Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"costpro-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "costpro-platform-events-consumption-dev", "reconcileID": "8532ab31-7025-48bf-991e-71c359c681d5"}
2023-03-23T14:31:56Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"boat-remover","namespace":"file-compressor"}, "namespace": "file-compressor", "name": "boat-remover", "reconcileID": "4dc467b9-cb6c-453b-9bcf-746594c87afe"}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"boat-remover","namespace":"file-compressor"}, "namespace": "file-compressor", "name": "boat-remover", "reconcileID": "4dc467b9-cb6c-453b-9bcf-746594c87afe"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2dcf3df]

goroutine 350 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x32ec900, 0x602e770})
        /usr/local/go/src/runtime/panic.go:838 +0x207
github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveScaleTargetPodSpec({0x40bfa10, 0xc0011baf30}, {0x40cd790, 0xc000d4cd20}, {{0x40c6200?, 0xc000ec76b0?}, 0x30f0760?}, {0x3873b80?, 0xc000722e00?})
        /workspace/pkg/scaling/resolver/scale_resolvers.go:71 +0x13f
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc000ee19e0, {0x40bfa10, 0xc0011baf30}, {0xc000ebf1c0, 0x34}, {0x3873b80, 0xc000722e00}, 0xc000e24f20, {0x0, 0x0}, ...)
        /workspace/pkg/scaling/scale_handler.go:264 +0x6db
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc000e24f98?, {0x40bfa10, 0xc0011baf30}, {0x3873b80, 0xc000722e00})
        /workspace/pkg/scaling/scale_handler.go:190 +0xf6
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc000d226f0?}, 0xc000722e00)
        /workspace/controllers/keda/hpa.go:200 +0x8c
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc000d2e000, {0x40bfa10?, 0xc0011baf30?}, {{0x40c6200?, 0xc0011baf60?}, 0x38183c0?}, 0xc000722e00, 0xc003a63608)
        /workspace/controllers/keda/hpa.go:74 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc0011baf30?}, 0xc000722e00, 0xc000d00e00, 0xc0009f40a0?)
        /workspace/controllers/keda/hpa.go:152 +0x7b
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0x40c6200?}, 0xc000722e00, 0x0?)
        /workspace/controllers/keda/scaledobject_controller.go:427 +0x238
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc000d2e000?, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc000d226f0?}, 0xc000722e00)
        /workspace/controllers/keda/scaledobject_controller.go:230 +0x1c9
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{{0xc000d22708?, 0x10?}, {0xc000d226f0?, 0x40d787?}}})
        /workspace/controllers/keda/scaledobject_controller.go:176 +0x526
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x40bf968?, {0x40bfa10?, 0xc0011baf30?}, {{{0xc000d22708?, 0x370f080?}, {0xc000d226f0?, 0x4041f4?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ee2640, {0x40bf968, 0xc000d7f8c0}, {0x34329a0?, 0xc000150f00?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ee2640, {0x40bf968, 0xc000d7f8c0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:230 +0x325

Please note that the ScaledObject boat-remover has 42 triggers defined on this single ScaledObject. I don't know how typical that is, but it stands out as potentially unusual. It may be exposing some kind of concurrency issue?

It is also notable that the microservice targeted by the ScaledObject also queries these 42 metrics, but keda metrics server itself is stable.

@zroubalik
Copy link
Member

zroubalik commented Mar 23, 2023

@reynoldsme that might be the issue. Btw are you sure that all scalers are defined correctly? My bet is that one of the 42 (yeah that unusual cout 😄 ) if failing. Would be great if you are able to confirm this theory, for example starting with ScaledObject that has only 1 trigger and constantly add one after another until you got the crash and then try just the scaler which caused the crash alone in the ScaledObject to double check it is failing.

@martinmr
Copy link
Author

@zroubalik I did as you suggested, but it's not our configs. I could get it working again (I can query the metrics) after removing some triggers and redeploying, but as soon as I deploy the target deployment it breaks again. This issue is on KEDA's end.

Getting it to work again is not consistent. It's currently stuck in a crash loop with this error:

fatal error: concurrent map writes

goroutine 1590 [running]:
runtime.throw({0x395dd78?, 0xc003b239c8?})
	/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc003b23970 sp=0xc003b23940 pc=0x4384f1
runtime.mapassign_faststr(0x0?, 0x34538c0?, {0xc005530c00, 0x25})
	/usr/local/go/src/runtime/map_faststr.go:212 +0x39c fp=0xc003b239d8 sp=0xc003b23970 pc=0x4133dc
reflect.mapassign_faststr(0x32a6f80, 0xc00552e0a8?, {0xc005530c00?, 0x396be6d?}, 0x19?)
	/usr/local/go/src/runtime/map.go:1357 +0x28 fp=0xc003b23a10 sp=0xc003b239d8 pc=0x4627a8
reflect.Value.SetMapIndex({0x32a6f80?, 0xc00100cdd0?, 0x2d0b?}, {0x30f04e0, 0xc004bf10a0, 0x98}, {0x34538c0, 0xc00552e0a8, 0x199})
	/usr/local/go/src/reflect/value.go:2232 +0x225 fp=0xc003b23a98 sp=0xc003b23a10 pc=0x49c6c5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc000df08f0, {0x32a6f80?, 0xc00100cdd0?, 0x374cf60?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:908 +0x1997 fp=0xc003b23dc8 sp=0xc003b23a98 pc=0x602037
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc000df08f0, {0x32a6f80?, 0xc00100cdd0?, 0x6?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc003b23e38 sp=0xc003b23dc8 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc000df08f0, {0x374cf60?, 0xc00100cd60?, 0x9fd8?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:866 +0x1267 fp=0xc003b24168 sp=0xc003b23e38 pc=0x601907
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc000df08f0, {0x374cf60?, 0xc00100cd60?, 0x6?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc003b241d8 sp=0xc003b24168 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc000df08f0, {0x3873b80?, 0xc00100cc00?, 0x0?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:866 +0x1267 fp=0xc003b24508 sp=0xc003b241d8 pc=0x601907
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc000df08f0, {0x3873b80?, 0xc00100cc00?, 0xd0?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:400 +0x45 fp=0xc003b24578 sp=0xc003b24508 pc=0x5ff4e5
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).unmarshal(0xc000df08f0, {0x3873b80?, 0xc00100cc00?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:187 +0x1de fp=0xc003b245f0 sp=0xc003b24578 pc=0x5fee5e
sigs.k8s.io/json/internal/golang/encoding/json.Unmarshal({0xc005524000, 0x7879, 0xa000}, {0x3873b80, 0xc00100cc00}, {0xc003b246e0, 0x2, 0x7?})
	/workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:112 +0x159 fp=0xc003b24630 sp=0xc003b245f0 pc=0x5febf9
sigs.k8s.io/json.UnmarshalCaseSensitivePreserveInts(...)
	/workspace/vendor/sigs.k8s.io/json/json.go:62
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).unmarshal(0xc004be0410?, {0x40a52f0?, 0xc00100cc00?}, {0xc005524000?, 0xc004be0420?, 0xc?}, {0xc005524000?, 0xc00007eaf0?, 0x409ca00?})
	/workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:258 +0x3a5 fp=0xc003b24700 sp=0xc003b24630 pc=0xadd465
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Decode(0xc00013b900, {0xc005524000, 0x7879, 0xa000}, 0x0, {0x40a52f0, 0xc00100cc00?})
	/workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:206 +0xa6a fp=0xc003b249f8 sp=0xc003b24700 pc=0xadcaea
k8s.io/apimachinery/pkg/runtime.WithoutVersionDecoder.Decode({{0x409cae0?, 0xc00013b900?}}, {0xc005524000?, 0x199?, 0xc00100cc00?}, 0x3596b80?, {0x40a52f0?, 0xc00100cc00?})
	/workspace/vendor/k8s.io/apimachinery/pkg/runtime/helper.go:252 +0x55 fp=0xc003b24a68 sp=0xc003b249f8 pc=0x89f655
k8s.io/apimachinery/pkg/runtime.(*WithoutVersionDecoder).Decode(0x3873b80?, {0xc005524000?, 0x0?, 0x0?}, 0x0?, {0x40a52f0?, 0xc00100cc00?})
	<autogenerated>:1 +0x69 fp=0xc003b24ab8 sp=0xc003b24a68 pc=0x8acd69
sigs.k8s.io/controller-runtime/pkg/client/apiutil.targetZeroingDecoder.Decode({{0x40a0940?, 0xc004bf02c0?}}, {0xc005524000, 0x7879, 0xa000}, 0x0?, {0x40a52f0?, 0xc00100cc00?})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/apiutil/apimachinery.go:186 +0xb8 fp=0xc003b24b28 sp=0xc003b24ab8 pc=0x14ef958
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*targetZeroingDecoder).Decode(0xc005361e00?, {0xc005524000?, 0xc004e8bb00?, 0x392f2e4?}, 0xc003b24c10?, {0x40a52f0?, 0xc00100cc00?})
	<autogenerated>:1 +0x69 fp=0xc003b24b78 sp=0xc003b24b28 pc=0x14f29e9
k8s.io/client-go/rest.Result.Into({{0xc005524000, 0x7879, 0xa000}, {0x0, 0x0, 0x0}, {0xc00490a4c0, 0x10}, {0x0, 0x0}, ...}, ...)
	/workspace/vendor/k8s.io/client-go/rest/request.go:1307 +0xad fp=0xc003b24c48 sp=0xc003b24b78 pc=0x115146d
sigs.k8s.io/controller-runtime/pkg/client.(*typedClient).PatchStatus(0xc0002d6070, {0x40bfa10, 0xc004e8bb00}, {0x40db8b0?, 0xc00100cc00?}, {0x40acaa0, 0xc005339200}, {0x0, 0x0, 0x0})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/typed_client.go:207 +0x4c5 fp=0xc003b24dc8 sp=0xc003b24c48 pc=0x1514f45
sigs.k8s.io/controller-runtime/pkg/client.(*statusWriter).Patch(0xc00058ae88, {0x40bfa10, 0xc004e8bb00}, {0x40db8b0?, 0xc00100cc00?}, {0x40acaa0, 0xc005339200}, {0x0, 0x0, 0x0})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:325 +0x38f fp=0xc003b24f18 sp=0xc003b24dc8 pc=0x150be6f
github.com/kedacore/keda/v2/pkg/fallback.updateStatus({0x40bfa10, 0xc004e8bb00}, {0x40cd790, 0xc00013bae0}, {{0x40c6200?, 0xc000df44b0?}, 0xc000d7c400?}, 0xc00100cc00, 0xc003b25268, {{0x39380f4, ...}, ...})
	/workspace/pkg/fallback/fallback.go:124 +0x5a5 fp=0xc003b25188 sp=0xc003b24f18 pc=0x16c02c5
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback({0x40bfa10, 0xc004e8bb00}, {0x40cd790, 0xc00013bae0}, {{0x40c6200?, 0xc000df44b0?}, 0x24?}, {0xc00962d5f0, 0x1, 0x1}, ...)
	/workspace/pkg/fallback/fallback.go:58 +0x61e fp=0xc003b25308 sp=0xc003b25188 pc=0x16bf7be
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics(0xc000df6240, {0x40bfa10, 0xc004e8bb00}, {0xc004ce7f80, 0x14}, {0xc004ce7f98, 0x12}, {0xc004ee8e40, 0x24})
	/workspace/pkg/scaling/scale_handler.go:446 +0x12c5 fp=0xc003b25918 sp=0xc003b25308 pc=0x2dd9205
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics(0xc000df4d50, {0x40bfa10, 0xc004e8bb00}, 0xc00895b7a0)
	/workspace/pkg/metricsservice/server.go:45 +0xbc fp=0xc003b259d0 sp=0xc003b25918 pc=0x2dedabc
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler({0x348b5e0?, 0xc000df4d50}, {0x40bfa10, 0xc004e8bb00}, 0xc0002d5e30, 0x0)
	/workspace/pkg/metricsservice/api/metrics_grpc.pb.go:79 +0x170 fp=0xc003b25a28 sp=0xc003b259d0 pc=0x1785750
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0006581e0, {0x40cb6d0, 0xc009720000}, 0xc004ee67e0, 0xc000df4de0, 0x6033620, 0x0)
	/workspace/vendor/google.golang.org/grpc/server.go:1340 +0xd13 fp=0xc003b25e48 sp=0xc003b25a28 pc=0x17706f3
google.golang.org/grpc.(*Server).handleStream(0xc0006581e0, {0x40cb6d0, 0xc009720000}, 0xc004ee67e0, 0x0)
	/workspace/vendor/google.golang.org/grpc/server.go:1713 +0xa1b fp=0xc003b25f68 sp=0xc003b25e48 pc=0x17756fb
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/workspace/vendor/google.golang.org/grpc/server.go:965 +0x98 fp=0xc003b25fe0 sp=0xc003b25f68 pc=0x176e198
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc003b25fe8 sp=0xc003b25fe0 pc=0x468ee1
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/workspace/vendor/google.golang.org/grpc/server.go:963 +0x28a

@martinmr
Copy link
Author

martinmr commented Mar 27, 2023

We updated to KEDA 2.10.0 and are seeing a new crash:

panic: runtime error: index out of range [15] with length 15

goroutine 2074 [running]:
github.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObjectSpec).DeepCopyInto(0xc000d17508, 0xc004e94108)
	/workspace/apis/keda/v1alpha1/zz_generated.deepcopy.go:708 +0x71c
github.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObject).DeepCopyInto(0xc000d17400, 0xc004e94000)
	/workspace/apis/keda/v1alpha1/zz_generated.deepcopy.go:597 +0xe8
github.com/kedacore/keda/v2/apis/keda/v1alpha1.(*ScaledObject).DeepCopy(...)
	/workspace/apis/keda/v1alpha1/zz_generated.deepcopy.go:607
github.com/kedacore/keda/v2/pkg/fallback.updateStatus({0x434be30, 0xc00644f4d0}, {0x4360950, 0xc000a2ac60}, 0xc000d17400, 0xc0075b3258, {{0x3b4b21d, 0x8}, 0x0, 0x0, ...})
	/workspace/pkg/fallback/fallback.go:116 +0x85
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback({0x434be30, 0xc00644f4d0}, {0x4360950, 0xc000a2ac60}, {0xc004158fc0, 0x1, 0x1}, {0x0?, 0x0}, {0xc0057eb230, ...}, ...)
	/workspace/pkg/fallback/fallback.go:59 +0x69d
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics(0xc000694ee0, {0x434be30, 0xc00644f4d0}, {0xc0057becc0, 0x14}, {0xc0057becd8, 0x12}, {0xc0057eb230, 0x23})
	/workspace/pkg/scaling/scale_handler.go:481 +0x128b
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics(0xc001015100, {0x434be30, 0xc00644f4d0}, 0xc004f28ea0)
	/workspace/pkg/metricsservice/server.go:48 +0xbc
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler({0x3674b80?, 0xc001015100}, {0x434be30, 0xc00644f4d0}, 0xc00049bf10, 0x0)
	/workspace/pkg/metricsservice/api/metrics_grpc.pb.go:98 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00089a3c0, {0x435af60, 0xc004dfc680}, 0xc0047d99e0, 0xc000a254d0, 0x63ef870, 0x0)
	/workspace/vendor/google.golang.org/grpc/server.go:1336 +0xd23
google.golang.org/grpc.(*Server).handleStream(0xc00089a3c0, {0x435af60, 0xc004dfc680}, 0xc0047d99e0, 0x0)
	/workspace/vendor/google.golang.org/grpc/server.go:1704 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/workspace/vendor/google.golang.org/grpc/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/workspace/vendor/google.golang.org/grpc/server.go:963 +0x28a

All the crashes seem related to updating the status of the scaled object, which we do not control. The code is autogenerated so I can't even look at it.

@zroubalik
Copy link
Member

zroubalik commented Mar 28, 2023

Thanks for the update, I am not saying the problem is on your side, it is obviously on KEDA's side, what I saying, that one of your trigger's configuration in the ScaledObject is incorrect (maybe a misspell, typo, wrong credentials..) which in the end is causing this problem - which should not!
So my suggestion was really about identifying the wrong trigger to mitigate the problem and also to give us a hint where is the problem. By chance, don't you see any errors or another strange message in the Operator log before the crash?

Looking at the latest erorr, this is super weird. It is crasing in the autogenerated part of the code, as you mentioned. Could you please confirm that the crash on v2.10.0 that you pasted here is the very first after you deploy the ScaledObject?

What is the number of triggers that works for you correctly?

@martinmr
Copy link
Author

There's no wrong trigger. I managed to get it working again with a smaller number of triggers as part of the testing but it stopped after a while (fifteen or twenty minutes). I didn't deploy any changes in the interim. So I don't think this has to do with the config.

The error I pasted above is the first one after the upgrade but I've seen others as well. @reynoldsme could you paste some more errors if you can? I don't have internet service today (typing this on my phone using roaming data).

@reynoldsme
Copy link

reynoldsme commented Mar 28, 2023

ok, I have performed the following steps:

  1. deleted the greg-detector ScaledObject (the one with 41 triggers)
  2. deleted the keda-operator pod
  3. Immediately recreated the greg-detector ScaledObject
  4. retrieved the pod logs of the fresh keda-operator pod
k logs -f keda-operator-6db4695bf-s4cd2 -n keda
2023-03-28T19:38:52Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2023-03-28T19:38:52Z    INFO    setup   Starting manager
2023-03-28T19:38:52Z    INFO    setup   KEDA Version: 2.10.0
2023-03-28T19:38:52Z    INFO    setup   Git Commit: ee28bf69389bca7bf6d5edd83e2bc3940254114c
2023-03-28T19:38:52Z    INFO    setup   Go Version: go1.19.7
2023-03-28T19:38:52Z    INFO    setup   Go OS/Arch: linux/amd64
2023-03-28T19:38:52Z    INFO    setup   Running on Kubernetes 1.24+     {"version": "v1.24.10-eks-48e63af"}
2023-03-28T19:38:52Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2023-03-28T19:38:52Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0328 19:38:52.887249       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
I0328 19:39:11.550865       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-03-28T19:39:11Z    INFO    Starting Controller     {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2023-03-28T19:39:11Z    INFO    Starting Controller     {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2023-03-28T19:39:11Z    INFO    Starting Controller     {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2023-03-28T19:39:11Z    INFO    cert-rotation   starting cert rotator controller
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2023-03-28T19:39:11Z    INFO    Starting Controller     {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *v1.Secret"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2023-03-28T19:39:11Z    INFO    Starting EventSource    {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"}
2023-03-28T19:39:11Z    INFO    Starting Controller     {"controller": "cert-rotator"}
2023-03-28T19:39:11Z    INFO    Starting workers        {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2023-03-28T19:39:11Z    INFO    Starting workers        {"controller": "cert-rotator", "worker count": 1}
2023-03-28T19:39:11Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-03-28T19:39:11Z    INFO    cert-rotation   Ensuring CA cert        {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2023-03-28T19:39:11Z    INFO    Starting workers        {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2023-03-28T19:39:11Z    INFO    cert-rotation   no cert refresh needed
2023-03-28T19:39:11Z    INFO    cert-rotation   certs are ready in /certs
2023-03-28T19:39:11Z    INFO    Starting workers        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2023-03-28T19:39:11Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"poodle-locator-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "poodle-locator-platform-events-consumption-d
ev", "reconcileID": "2bd52d2b-0e45-408e-823c-83e5e6c0f757"}
2023-03-28T19:39:11Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "784eac37-35c5-405
c-8d94-002fd6815360"}
2023-03-28T19:39:11Z    INFO    cert-rotation   Ensuring CA cert        {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
2023-03-28T19:39:11Z    INFO    Detected resource targeted for scaling  {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "784eac37-
35c5-405c-8d94-002fd6815360", "resource": "apps/v1.Deployment", "name": "greg-detector-dev"}
2023-03-28T19:39:11Z    INFO    Creating a new HPA      {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "784eac37-35c5-405c-8d94-0
02fd6815360", "HPA.Namespace": "greg", "HPA.Name": "keda-hpa-greg-detector"}
2023-03-28T19:39:12Z    INFO    Initializing Scaling logic according to ScaledObject Specification      {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"poodle-locator-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "sal
esforcedlp-platform-events-consumption-dev", "reconcileID": "2bd52d2b-0e45-408e-823c-83e5e6c0f757"}
2023-03-28T19:39:12Z    INFO    cert-rotation   CA certs are injected to webhooks
2023-03-28T19:39:12Z    INFO    grpc_server     Starting Metrics Service gRPC Server    {"address": ":9666"}
2023-03-28T19:39:25Z    INFO    Initializing Scaling logic according to ScaledObject Specification      {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "historical-throt
tler", "reconcileID": "784eac37-35c5-405c-8d94-002fd6815360"}
2023-03-28T19:39:25Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "0774714d-2ca4-4ef
e-b257-6fcdbe914e9c"}
2023-03-28T19:39:25Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "3783e86a-48d6-441
a-acb0-b44bacd37c7e"}
2023-03-28T19:39:25Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"poodle-locator-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "poodle-locator-platform-events-consumption-d
ev", "reconcileID": "eb071d6e-2e64-44a4-9bf1-358a7a18cf89"}
2023-03-28T19:39:25Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"poodle-locator-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "poodle-locator-platform-events-consumption-d
ev", "reconcileID": "e1f771ab-6d8b-43b7-8a65-c67552fc1f63"}
2023-03-28T19:39:31Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"poodle-locator-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "poodle-locator-platform-events-consumption-d
ev", "reconcileID": "d135f3ea-e6e2-4f04-a7da-d75a6d55f8ef"}
2023-03-28T19:39:32Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"greg-detector","namespace":"greg"}, "namespace": "greg", "name": "greg-detector", "reconcileID": "db345be5-175f-43a
6-91af-af140cb8424a"}
panic: assignment to entry in nil map

goroutine 1826 [running]:
reflect.mapassign_faststr(0x3476f80, 0xc0055c98a8?, {0xc0083d1ce0?, 0x3b81273?}, 0x19?)
        /usr/local/go/src/runtime/map.go:1359 +0x28
reflect.Value.SetMapIndex({0x3476f80?, 0xc000138dd0?, 0x4826?}, {0x32aef00, 0xc008d1dae0, 0x98}, {0x363a6a0, 0xc0055c98a8, 0x199})
        /usr/local/go/src/reflect/value.go:2302 +0x225
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0003e9380, {0x3476f80?, 0xc000138dd0?, 0x394ea80?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:909 +0x1a97
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0003e9380, {0x3476f80?, 0xc000138dd0?, 0x6?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:401 +0x45
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0003e9380, {0x394ea80?, 0xc000138d60?, 0xdfd8?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:867 +0x1325
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0003e9380, {0x394ea80?, 0xc000138d60?, 0x6?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:401 +0x45
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).object(0xc0003e9380, {0x3a94540?, 0xc000138c00?, 0x0?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:867 +0x1325
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).value(0xc0003e9380, {0x3a94540?, 0xc000138c00?, 0xd0?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:401 +0x45
sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).unmarshal(0xc0003e9380, {0x3a94540?, 0xc000138c00?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:188 +0x1de
sigs.k8s.io/json/internal/golang/encoding/json.Unmarshal({0xc0022ce000, 0xa391, 0xe000}, {0x3a94540, 0xc000138c00}, {0xc0012f86e8, 0x2, 0x7?})
        /workspace/vendor/sigs.k8s.io/json/internal/golang/encoding/json/decode.go:113 +0x159
sigs.k8s.io/json.UnmarshalCaseSensitivePreserveInts(...)
        /workspace/vendor/sigs.k8s.io/json/json.go:62
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).unmarshal(0xc0059f0ae0?, {0x432fc70?, 0xc000138c00?}, {0xc0022ce000?, 0xc0059f0af0?, 0xc?}, {0xc0022ce000?, 0xc0006cdea0?, 0x4326a00?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:258 +0x3a5
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Decode(0xc00052dd10, {0xc0022ce000, 0xa391, 0xe000}, 0x0, {0x432fc70, 0xc000138c00?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/json/json.go:206 +0xa6a
k8s.io/apimachinery/pkg/runtime.WithoutVersionDecoder.Decode({{0x4326ae0?, 0xc00052dd10?}}, {0xc0022ce000?, 0x199?, 0xc000138c00?}, 0x378ae80?, {0x432fc70?, 0xc000138c00?})
        /workspace/vendor/k8s.io/apimachinery/pkg/runtime/helper.go:252 +0x55
sigs.k8s.io/controller-runtime/pkg/client/apiutil.targetZeroingDecoder.Decode({{0x432abe0?, 0xc008d1c540?}}, {0xc0022ce000, 0xa391, 0xe000}, 0x0?, {0x432fc70?, 0xc000138c00?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/apiutil/apimachinery.go:210 +0xb8
k8s.io/client-go/rest.Result.Into({{0xc0022ce000, 0xa391, 0xe000}, {0x0, 0x0, 0x0}, {0xc0059f0830, 0x10}, {0x0, 0x0}, ...}, ...)
        /workspace/vendor/k8s.io/client-go/rest/request.go:1332 +0xad
sigs.k8s.io/controller-runtime/pkg/client.(*typedClient).PatchSubResource(0xc0000b17a0, {0x434be30, 0xc008972c00}, {0x4374708?, 0xc000138c00}, {0x3b481dc, 0x6}, {0x4337970, 0xc008eba7b0}, {0x0, ...})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/typed_client.go:280 +0x4e5
sigs.k8s.io/controller-runtime/pkg/client.(*subResourceClient).Patch(0xc004e3a708, {0x434be30, 0xc008972c00}, {0x4374708?, 0xc000138c00?}, {0x4337970, 0xc008eba7b0}, {0x0, 0x0, 0x0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:480 +0x3d0
github.com/kedacore/keda/v2/pkg/fallback.updateStatus({0x434be30, 0xc008972c00}, {0x4360950, 0xc000fa20c0}, 0xc000138c00, 0xc0012f9258, {{0x3b4b21d, 0x8}, 0x0, 0x0, ...})
        /workspace/pkg/fallback/fallback.go:125 +0x563
github.com/kedacore/keda/v2/pkg/fallback.GetMetricsWithFallback({0x434be30, 0xc008972c00}, {0x4360950, 0xc000fa20c0}, {0xc0084becf0, 0x1, 0x1}, {0x0?, 0x0}, {0xc0083d0300, ...}, ...)
        /workspace/pkg/fallback/fallback.go:59 +0x69d
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics(0xc000533f10, {0x434be30, 0xc008972c00}, {0xc0087a9a10, 0x14}, {0xc0087a9a40, 0x12}, {0xc0083d0300, 0x25})
        /workspace/pkg/scaling/scale_handler.go:481 +0x128b
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics(0xc000e786c0, {0x434be30, 0xc008972c00}, 0xc0084da780)
        /workspace/pkg/metricsservice/server.go:48 +0xbc
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler({0x3674b80?, 0xc000e786c0}, {0x434be30, 0xc008972c00}, 0xc000242850, 0x0)
        /workspace/pkg/metricsservice/api/metrics_grpc.pb.go:98 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000e7e780, {0x435af60, 0xc008ca7a00}, 0xc001794fc0, 0xc00888be60, 0x63ef870, 0x0)
        /workspace/vendor/google.golang.org/grpc/server.go:1336 +0xd23
google.golang.org/grpc.(*Server).handleStream(0xc000e7e780, {0x435af60, 0xc008ca7a00}, 0xc001794fc0, 0x0)
        /workspace/vendor/google.golang.org/grpc/server.go:1704 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /workspace/vendor/google.golang.org/grpc/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /workspace/vendor/google.golang.org/grpc/server.go:963 +0x28a
[process exited here]

@tomkerkhove tomkerkhove moved this from To Triage to To Do in Roadmap - KEDA Core Mar 29, 2023
@djsly
Copy link

djsly commented Apr 27, 2023

I work with @martinmr Here is an example crash (the names of some namespaces and ScaledObjects have been replaced) after deploying the workload, ScaledObject, then recreating the keda-operator pod via kubctl delete:

$ k logs -f keda-operator-7b5f8c4c44-qmhcz -n keda
2023-03-23T14:31:40Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2023-03-23T14:31:40Z    INFO    setup   Starting manager
2023-03-23T14:31:40Z    INFO    setup   KEDA Version: 2.9.2
2023-03-23T14:31:40Z    INFO    setup   Git Commit: 9bc3f66578a08cdfe084468ea3ef998fa6bf3bb0
2023-03-23T14:31:40Z    INFO    setup   Go Version: go1.18.8
2023-03-23T14:31:40Z    INFO    setup   Go OS/Arch: linux/amd64
2023-03-23T14:31:40Z    INFO    setup   Running on Kubernetes 1.24+     {"version": "v1.24.10-eks-48e63af"}
I0323 14:31:40.348184       1 leaderelection.go:248] attempting to acquire leader lease keda/operator.keda.sh...
2023-03-23T14:31:40Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2023-03-23T14:31:40Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
I0323 14:31:56.116193       1 leaderelection.go:258] successfully acquired lease keda/operator.keda.sh
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"}
2023-03-23T14:31:56Z    INFO    grpc_server     Starting Metrics Service gRPC Server    {"address": ":9666"}
2023-03-23T14:31:56Z    INFO    Starting EventSource    {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting Controller     {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5}
2023-03-23T14:31:56Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"costpro-platform-events-consumption-dev","namespace":"applications"}, "namespace": "applications", "name": "costpro-platform-events-consumption-dev", "reconcileID": "8532ab31-7025-48bf-991e-71c359c681d5"}
2023-03-23T14:31:56Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"boat-remover","namespace":"file-compressor"}, "namespace": "file-compressor", "name": "boat-remover", "reconcileID": "4dc467b9-cb6c-453b-9bcf-746594c87afe"}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Starting workers        {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1}
2023-03-23T14:31:56Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"boat-remover","namespace":"file-compressor"}, "namespace": "file-compressor", "name": "boat-remover", "reconcileID": "4dc467b9-cb6c-453b-9bcf-746594c87afe"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2dcf3df]

goroutine 350 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x32ec900, 0x602e770})
        /usr/local/go/src/runtime/panic.go:838 +0x207
github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveScaleTargetPodSpec({0x40bfa10, 0xc0011baf30}, {0x40cd790, 0xc000d4cd20}, {{0x40c6200?, 0xc000ec76b0?}, 0x30f0760?}, {0x3873b80?, 0xc000722e00?})
        /workspace/pkg/scaling/resolver/scale_resolvers.go:71 +0x13f
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc000ee19e0, {0x40bfa10, 0xc0011baf30}, {0xc000ebf1c0, 0x34}, {0x3873b80, 0xc000722e00}, 0xc000e24f20, {0x0, 0x0}, ...)
        /workspace/pkg/scaling/scale_handler.go:264 +0x6db
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc000e24f98?, {0x40bfa10, 0xc0011baf30}, {0x3873b80, 0xc000722e00})
        /workspace/pkg/scaling/scale_handler.go:190 +0xf6
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc000d226f0?}, 0xc000722e00)
        /workspace/controllers/keda/hpa.go:200 +0x8c
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc000d2e000, {0x40bfa10?, 0xc0011baf30?}, {{0x40c6200?, 0xc0011baf60?}, 0x38183c0?}, 0xc000722e00, 0xc003a63608)
        /workspace/controllers/keda/hpa.go:74 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc0011baf30?}, 0xc000722e00, 0xc000d00e00, 0xc0009f40a0?)
        /workspace/controllers/keda/hpa.go:152 +0x7b
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0x40c6200?}, 0xc000722e00, 0x0?)
        /workspace/controllers/keda/scaledobject_controller.go:427 +0x238
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc000d2e000?, {0x40bfa10, 0xc0011baf30}, {{0x40c6200?, 0xc0011baf60?}, 0xc000d226f0?}, 0xc000722e00)
        /workspace/controllers/keda/scaledobject_controller.go:230 +0x1c9
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc000d2e000, {0x40bfa10, 0xc0011baf30}, {{{0xc000d22708?, 0x10?}, {0xc000d226f0?, 0x40d787?}}})
        /workspace/controllers/keda/scaledobject_controller.go:176 +0x526
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x40bf968?, {0x40bfa10?, 0xc0011baf30?}, {{{0xc000d22708?, 0x370f080?}, {0xc000d226f0?, 0x4041f4?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000ee2640, {0x40bf968, 0xc000d7f8c0}, {0x34329a0?, 0xc000150f00?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000ee2640, {0x40bf968, 0xc000d7f8c0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:230 +0x325

Please note that the ScaledObject boat-remover has 42 triggers defined on this single ScaledObject. I don't know how typical that is, but it stands out as potentially unusual. It may be exposing some kind of concurrency issue?

It is also notable that the microservice targeted by the ScaledObject also queries these 42 metrics, but keda metrics server itself is stable.

We are having the same issue. on different AKS clusters with different but similar ScaledObjects.

I'm running kedacore/keda:2.10.0

This ScaledObject is working
sie-spa-xla-4xx-krypton-scaledobject-prod1c.txt

This ScaledObject is causing the Panic (they are identical, except they have different names)
sie-spa-esp-4xx-krypton-scaledobject-prod1c.txt

2023-04-27T19:34:05Z    INFO    Reconciling ScaledObject        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"sie-spa-esp-4xx-krypton-scaledobject","namespace":"asraas-prod"}, "namespace": "asraas-prod", "name": "sie-spa-esp-4xx-krypton-scaledobject", "reconcileID": "22fb0d9e-61a7-4f09-8cf2-6ac5f8e9d355"}
2023-04-27T19:34:05Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"sie-spa-esp-4xx-krypton-scaledobject","namespace":"asraas-prod"}, "namespace": "asraas-prod", "name": "sie-spa-esp-4xx-krypton-scaledobject", "reconcileID": "22fb0d9e-61a7-4f09-8cf2-6ac5f8e9d355"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2f62194]

goroutine 592 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0x1fa
panic({0x34bf3a0, 0x63ea940})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveScaleTargetPodSpec({0x434be30, 0xc00f6898c0}, {0x4360950, 0xc001407a40}, {0x3a94540?, 0xc0000f1200})
        /workspace/pkg/scaling/resolver/scale_resolvers.go:73 +0xd4
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc001797110, {0x434be30, 0xc00f6898c0}, {0xc001d34d00, 0x3d}, {0x3a94540, 0xc0000f1200}, 0xc008c4af00, {0x0, 0x0}, ...)
        /workspace/pkg/scaling/scale_handler.go:347 +0x6e5
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc0000f0e00?, {0x434be30, 0xc00f6898c0}, {0x3a94540, 0xc0000f1200})
        /workspace/pkg/scaling/scale_handler.go:273 +0xf6
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc001484ba0, {0x434be30, 0xc00f6898c0}, {{0x4353fc0?, 0xc00f6898f0?}, 0xc009524960?}, 0xc0000f0e00)
        /workspace/controllers/keda/hpa.go:200 +0xda
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc001484ba0, {0x434be30?, 0xc00f6898c0?}, {{0x4353fc0?, 0xc00f6898f0?}, 0x3a221a0?}, 0xc0000f0e00, 0xc008c4b5f0)
        /workspace/controllers/keda/hpa.go:74 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc001484ba0, {0x434be30, 0xc00f6898c0}, {{0x4353fc0?, 0xc00f6898f0?}, 0xc00f6898c0?}, 0xc0000f0e00, 0xc008c46540, 0xc001d46cf0?)
        /workspace/controllers/keda/hpa.go:152 +0x7b
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc001484ba0, {0x434be30, 0xc00f6898c0}, {{0x4353fc0?, 0xc00f6898f0?}, 0x4353fc0?}, 0xc0000f0e00, 0x0?)
        /workspace/controllers/keda/scaledobject_controller.go:431 +0x238
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc001484ba0?, {0x434be30, 0xc00f6898c0}, {{0x4353fc0?, 0xc00f6898f0?}, 0xc009524960?}, 0xc0000f0e00)
        /workspace/controllers/keda/scaledobject_controller.go:229 +0x1c9
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc001484ba0, {0x434be30, 0xc00f6898c0}, {{{0xc009523240?, 0x10?}, {0xc009524960?, 0x40da87?}}})
        /workspace/controllers/keda/scaledobject_controller.go:175 +0x526
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x434be30?, {0x434be30?, 0xc00f6898c0?}, {{{0xc009523240?, 0x32ae080?}, {0xc009524960?, 0x0?}}})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000f55a40, {0x434bd88, 0xc000ddc2c0}, {0x3619020?, 0xc0004d21c0?})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323 +0x38f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000f55a40, {0x434bd88, 0xc000ddc2c0})
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:231 +0x333

@djsly
Copy link

djsly commented Apr 27, 2023

From the Code, it seems that the ScaledObject referenced received by : resolver.ResolveScaleTargetPodSpec is null, since the panic occurs when it tries to access a property after the object gets cast. I'm not sure why the the object would be null or become null... I think nulll protection/detection is missing non the less.

@zroubalik
Copy link
Member

Thanks for the provided info, the failing SO is missing Status.ScaleTargetGVKR property, which causes the runtime error.

There might be some race condition in processing the SO. I will try to investigate later, will be away next two weeks, but will try to check it then.

If you happen to find any more details in the meantime, please attach them here.

@djsly
Copy link

djsly commented May 15, 2023

@zroubalik I don't have any thing more to share expect that we had the same issue again yesterday. this time it was a different ScaledObject. after we deleted it, everything went back to normal.

@timown
Copy link

timown commented Jun 8, 2023

just happened to us too, deleting the scaled object and recreating it solved the issue

@saurabhvagrawal
Copy link

just happened to us too, deleting the scaled object and recreating it solved the issue

I am working in same project with @timown . This happened in production today. We tried with 2.10.1 and 2.9.x and still got the same issue. Here are the attached logs from operator for your reference:

"namespace": "", "name": "", "reconcileID": "973016f5-c838-44e9-82c6-f3d49afa52a7", "trigger.type": "external"}
2023-06-08T05:36:29Z INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"","namespace":""}, "namespace": "", "name": "", "reconcileID": "973016f5-c838-44e9-82c6-f3d49afa52a7"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2f62594]

goroutine 398 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0x1fa
panic({0x34c03a0, 0x63ec950})
/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/kedacore/keda/v2/pkg/scaling/resolver.ResolveScaleTargetPodSpec({0x434cf50, 0xc003093410}, {0x4361a70, 0xc000f93260}, {0x3a95560?, 0xc000a84200})
/workspace/pkg/scaling/resolver/scale_resolvers.go:73 +0xd4
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).performGetScalersCache(0xc0001ae5b0, {0x434cf50, 0xc003093410}, {0xc004bc5ce0, 0x26}, {0x3a95560, 0xc000a84200}, 0xc0021aef00, {0x0, 0x0}, ...)
/workspace/pkg/scaling/scale_handler.go:347 +0x6e5
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScalersCache(0xc00488fa00?, {0x434cf50, 0xc003093410}, {0x3a95560, 0xc000a84200})
/workspace/pkg/scaling/scale_handler.go:273 +0xf6
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).getScaledObjectMetricSpecs(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc00236a4f0?}, 0xc00488fa00)
/workspace/controllers/keda/hpa.go:200 +0xda
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).newHPAForScaledObject(0xc000f708a0, {0x434cf50?, 0xc003093410?}, {{0x43550e0?, 0xc003093440?}, 0x3a231c0?}, 0xc00488fa00, 0xc0021af5f0)
/workspace/controllers/keda/hpa.go:74 +0x66
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).updateHPAIfNeeded(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc003093410?}, 0xc00488fa00, 0xc0005fe700, 0xc004468828?)
/workspace/controllers/keda/hpa.go:152 +0x7b
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).ensureHPAForScaledObjectExists(0xc000f708a0, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0x43550e0?}, 0xc00488fa00, 0x0?)
/workspace/controllers/keda/scaledobject_controller.go:431 +0x238
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).reconcileScaledObject(0xc000f708a0?, {0x434cf50, 0xc003093410}, {{0x43550e0?, 0xc003093440?}, 0xc0023f35f0?}, 0xc00488fa00)
/workspace/controllers/keda/scaledobject_controller.go:229 +0x1d8
github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile(0xc000f708a0, {0x434cf50, 0xc003093410}, {{{0xc0023f3550?, 0x10?}, {0xc0023f35f0?, 0x40da87?}}})
/workspace/controllers/keda/scaledobject_controller.go:175 +0x526
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x434cf50?, {0x434cf50?, 0xc003093410?}, {{{0xc0023f3550?, 0x32af080?}, {0xc0023f35f0?, 0x0?}}})
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0010921e0, {0x434cea8, 0xc001296300}, {0x361a020?, 0xc0008bec00?})
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323 +0x38f
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0010921e0, {0x434cea8, 0xc001296300})
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:231 +0x333

@zroubalik
Copy link
Member

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

@zroubalik
Copy link
Member

Is anybody here willing to test a patched version before the official release is out?

@timown
Copy link

timown commented Jun 8, 2023

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

it is, in our case it uses cpu, memory and external

@reynoldsme
Copy link

reynoldsme commented Jun 8, 2023

@saurabhvagrawal @timown @djsly @reynoldsme @martinmr et all: could you please confirm that the failing ScaledObject uses external trigger?

@zroubalik None of the ScaledObjects where we see this issue were using triggers of type external as in https://keda.sh/docs/2.10/concepts/external-scalers/

We were only seeing this on ScaledObjects with triggers of typedatadog. @martinmr It was just that, correct?

@zroubalik
Copy link
Member

zroubalik commented Jun 8, 2023

The external type is not relevant. I have probably a fix ready, but I am not able to reproduce the original issue, so would be great if anybody can give it a try. If anybody can test it out, I have a container with the patch.

@zroubalik
Copy link
Member

zroubalik commented Jun 8, 2023

In case anyone would like to try it out:

To apply the fix KEDA Operator image needs to be changed to quay.io/zroubalik/keda:crashfix. It is KEDA version 2.10.1 + the potential fix.

@saurabhvagrawal
Copy link

In case anyone would like to try it out:

To apply the fix KEDA Operator image needs to be changed to quay.io/zroubalik/keda:crashfix. It is KEDA version 2.10.1 + the potential fix.

Sure @zroubalik. But before that, can we get the changelog/PR please to understand what is fixed.

@saurabhvagrawal
Copy link

In case anyone would like to try it out:
To apply the fix KEDA Operator image needs to be changed to quay.io/zroubalik/keda:crashfix. It is KEDA version 2.10.1 + the potential fix.

Sure @zroubalik. But before that, can we get the changelog/PR please to understand what is fixed.

@zroubalik : kind ping on this.

@zroubalik
Copy link
Member

@saurabhvagrawal it's released KEDA 2.10.1 + following commit
zroubalik@eaa8350

@sergiomacedo
Copy link

sergiomacedo commented Jun 14, 2023

Will this fix be applied to keda 2.9.x too? We're seeing this problem since yesterday...

EDIT: Situation update
We removed the offending scaledobject, restarted keda and reapplied the scaledobject. Everything worked as expected.
After that, we simply restart keda deployment and the reconcile worked as expected.

So it seems to be a race condition of some kind and kind hard to reproduce...
helm chart: keda-2.9.4
app version: 2.9.3

@zroubalik
Copy link
Member

If we got a confirmation that it resolves the issue, then we can probably think about backporting it to 2.9 as well

@zroubalik
Copy link
Member

@saurabhvagrawal have you got a chance to try the fix?

Or anybody else?

@timown
Copy link

timown commented Jun 21, 2023

@saurabhvagrawal have you got a chance to try the fix?

Or anybody else?

it happened to us once (and we have 200 clusters)

@zroubalik
Copy link
Member

@timown with the fix?

@timown
Copy link

timown commented Jun 21, 2023

@timown with the fix?

no no, sorry, with the latest official release
so didn't really have a chance to test the fix without a reproduce

@saurabhvagrawal
Copy link

@timown with the fix?

no no, sorry, with the latest official release so didn't really have a chance to test the fix without a reproduce

I agree with @timown. @zroubalik : We are unable to reproduce the said issue and not sure how to verify the fix. But looking at your commit, do you think nil pointer is due to because its unable to get ScaledObject resource since we are checking if its nil and if yes, it attempts to retrieve the corresponding ScaledObject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants