Fallback is triggered without fallback.failureThreshold being taken into account #6053

s-shirayama · 2024-08-09T04:48:16Z

Report

When the scaler fails to get metric with the fallback option enabled, we expect that KEDA would scale deployment to fallback.replica after consecutive failures defined on fallback.failureThreshold. But KEDA scaled deployment to fallback.replica immediately after the scaler's first failure.

This seems to be a different behavior from that described in the official documentation.

Expected Behavior

KEDA scales the target deployment to fallback.replica after consecutive failures defined on fallback.failureThreshold.

Actual Behavior

KEDA scales deployment to fallback.replica immediately after the scaler's first failure.

Steps to Reproduce the Problem

Set up ScaledObject with the fallback option enabled
1. Set a high number to fallback.failureThreshold
Make the scaler fail to get metric (e.g. set wrong URL for MetricsAPI scaler)
Check HPA's desired replica and Number Of Failures of the scaler.

This is ScaledObject spec to reproduce the issue.

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: fallback-test
spec:
  minReplicaCount: 1
  maxReplicaCount: 10
  fallback:
    failureThreshold: 10
    replicas: 5
  scaleTargetRef:
    name: nginx
  triggers:
  - type: metrics-api
    metadata:
      targetValue: "1"
      url: "http://dummy/"
      valueLocation: "dummy"
EOF

When checking HPA's desired replica, it was scaled to fallback.replicas immediately.

❯ k get hpa
NAME                     REFERENCE          TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-fallback-test   Deployment/nginx   <unknown>/1 (avg)   1         10        5          19s

Number Of Failures was less than fallback.failureThreshold, which was unexpected.

❯ k describe so
Name:         fallback-test
Namespace:    default
API Version:  keda.sh/v1alpha1
Kind:         ScaledObject
Spec:
  Fallback:
    Failure Threshold:  10
    Replicas:           5
  Max Replica Count:    10
  Min Replica Count:    1
  Scale Target Ref:
    Name:  nginx
  Triggers:
    Metadata:
      Target Value:    1
      URL:             http://dummy/
      Value Location:  dummy
    Type:              metrics-api
Status:
  Conditions:
    Status:   Unknown
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Message:  No fallbacks are active on this scaled object
    Reason:   NoFallbackFound
    Status:   False
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
  External Metric Names:
    s0-metric-api-dummy
  Health:
    s0-metric-api-dummy:
      Number Of Failures:  1
      Status:              Failing
  Hpa Name:                keda-hpa-fallback-test
  Original Replica Count:  1
:
Events:
  Type     Reason              Age                From           Message
  ----     ------              ----               ----           -------
  Normal   KEDAScalersStarted  52s                keda-operator  Started scalers watch
  Normal   ScaledObjectReady   52s                keda-operator  ScaledObject is ready for scaling
  Warning  KEDAScalerFailed    22s (x2 over 52s)  keda-operator  error requesting metrics endpoint: Get "http://dummy/": dial tcp: lookup dummy on 192.168.194.138:53: no such host
  Normal   KEDAScalersStarted  7s (x3 over 52s)   keda-operator  Scaler metrics-api is built.

Logs from KEDA operator

The logs says Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas with "New Replicas Count": 5.

2024-08-09T04:05:29Z	INFO	Initializing Scaling logic according to ScaledObject Specification	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "dcff18d3-3077-4ef0-a235-b6166e8f8748"}
2024-08-09T04:05:29Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "2e67c111-cb43-4c60-a372-37bc117c9a76"}
2024-08-09T04:05:29Z	INFO	Detected resource targeted for scaling	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "2e67c111-cb43-4c60-a372-37bc117c9a76", "resource": "apps/v1.Deployment", "name": "nginx"}
2024-08-09T04:05:29Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "default", "scaledObject.Name": "fallback-test", "scaler": "metricsAPIScaler", "error": "error requesting metrics endpoint: Get \"http://dummy/\": dial tcp: lookup dummy on 192.168.194.138:53: no such host"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalerState
	/workspace/pkg/scaling/scale_handler.go:780
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState.func1
	/workspace/pkg/scaling/scale_handler.go:633
2024-08-09T04:05:29Z	INFO	scaleexecutor	Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas	{"scaledobject.Name": "fallback-test", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx", "Original Replicas Count": 1, "New Replicas Count": 5}

KEDA Version

2.15.0

Kubernetes Version

1.29

Platform

Other

Scaler Details

Any, but I used metrics-api for testing.

Anything else?

According to scale_scaledobjects.go, it seems this behavior (= scaling to fallback.replicas if there is no active scalers and scaler responds with an error) is intentional. But it looks to be a different behavior from that described in the official documentation.

We set a high value to fallback.failureThreshold to avoid triggering frequent fallbacks with temporary, short-lived failures on external metrics retrieval. But it doesn't work expectedly as described above.

The text was updated successfully, but these errors were encountered:

s-shirayama · 2024-08-28T08:19:27Z

Hi, is there any update on this? I can provide more details if needed.

Thanks!

stale · 2024-10-28T23:15:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

zroubalik · 2024-11-05T22:40:07Z

Thanks for reporting, we should definitely check this.

s-shirayama added the bug Something isn't working label Aug 9, 2024

keda-automation added this to Roadmap - KEDA Core Aug 9, 2024

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Aug 9, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 28, 2024

zroubalik removed the stale All issues that are marked as stale due to inactivity label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback is triggered without fallback.failureThreshold being taken into account #6053

Fallback is triggered without fallback.failureThreshold being taken into account #6053

s-shirayama commented Aug 9, 2024

s-shirayama commented Aug 28, 2024

stale bot commented Oct 28, 2024

zroubalik commented Nov 5, 2024

Fallback is triggered without fallback.failureThreshold being taken into account #6053

Fallback is triggered without fallback.failureThreshold being taken into account #6053

Comments

s-shirayama commented Aug 9, 2024

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

s-shirayama commented Aug 28, 2024

stale bot commented Oct 28, 2024

zroubalik commented Nov 5, 2024