Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus: Scaling to zero not working in KEDA 1.4.0 #770

Closed
hmoravec opened this issue Apr 22, 2020 · 13 comments · Fixed by #772
Closed

Prometheus: Scaling to zero not working in KEDA 1.4.0 #770

hmoravec opened this issue Apr 22, 2020 · 13 comments · Fixed by #772
Labels
bug Something isn't working scaler-prometheus
Milestone

Comments

@hmoravec
Copy link

Scaling to and from zero is not working with Prometheus scaler after KEDA upgrade from 1.3.0 to 1.4.0.

Expected Behavior

KEDA should scale up deployment from zero when Prometheus metric increases from zero and scale down to zero when the metric decreases to zero.

Actual Behavior

KEDA scales deployment from zero to one replica even when Prometheus metric is zero and never scales it down to zero.

Steps to Reproduce the Problem

  1. Deploy KEDA in GKE cluster using KEDA Helm chart in version 1.4.0.
  2. Deploy any deployment and corresponding ScaledObject with minReplicaCount=0 and use Prometheus scaler with query that returns constant zero.
  3. Watch that KEDA never scales the deployment down to zero.
  4. Upgrade KEDA using the same Helm chart with KEDA downgraded to 1.3.0 using values:
image:
  keda: docker.io/kedacore/keda:1.3.0
  metricsAdapter: docker.io/kedacore/keda-metrics-adapter:1.3.0

and watch that KEDA scales the deployment to zero and again up from zero when the metric becomes positive as expected.

Specifications

  • KEDA Version: 1.4.0
  • Platform & Version: linux/amd64
  • Kubernetes Version: 1.15 (1.15.9-gke.24)
  • Scaler(s): Prometheus
@hmoravec hmoravec added the bug Something isn't working label Apr 22, 2020
@zroubalik
Copy link
Member

zroubalik commented Apr 22, 2020

Could you please rerun 1.4.0 with debug log level on KEDA operator? https://github.com/kedacore/keda#setting-log-levels (or set it in the chart).

@zroubalik
Copy link
Member

And paste here your ScaledObject please.

@hmoravec
Copy link
Author

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaling
  labels:
    deploymentName: worker
spec:
  scaleTargetRef:
    deploymentName: worker
  pollingInterval: 30
  cooldownPeriod:  60
  minReplicaCount: 0
  maxReplicaCount: 2
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.monitoring.svc.cluster.local:9090
      metricName: queue_length
      threshold: '120'
      query: avg_over_time(worker_queue_length[5m])

I don't see anything suspicious in keda-logs.txt, except the last line
I0423 05:18:21.328041 1 wrap.go:47] GET /openapi/v2: (3.086429ms) 404 [ 172.16.0.23:33094]

KEDA scales replica to 1 from 0 even though the Prometheus metric is zero:
{"level":"info","ts":1587618899.730634,"logger":"scalehandler","msg":"Successfully updated deployment","ScaledObject.Namespace":"staging","ScaledObject.Name":"prometheus-scaling","ScaledObject.ScaleType":"deployment","Deployment.Namespace":"staging","Deployment.Name":"worker","Original Replicas Count":0,"New Replicas Count":1}

@zroubalik zroubalik changed the title Scaling to zero not working in KEDA 1.4.0 Prometheus: Scaling to zero not working in KEDA 1.4.0 Apr 23, 2020
@zroubalik
Copy link
Member

It seems like this regression was brought by this change: https://github.com/kedacore/keda/pull/695/files#diff-a63ae5a2f6036b9f3bc750d5fe46437cR105

@droessmj what did you do this particular change? ie. Set scaler to active even for value 0?

@droessmj
Copy link
Contributor

@zroubalik That commit updated a zero result to be non-error inducing. Based on the behavior described above I'm assuming since it's now not throwing errors, the scaler ensures 1 replica is up regardless. The "fix" I introduced just returns zero as a valid metric. If we need a special case where zero is non-error inducing but not a valid metric that can be introduced, but as-is the referenced commit just allows this code to now run for zero results:

	metric := external_metrics.ExternalMetricValue{
		MetricName: metricName,
		Value:      *resource.NewQuantity(int64(val), resource.DecimalSI),
		Timestamp:  metav1.Now(),
	}

@zroubalik
Copy link
Member

zroubalik commented Apr 23, 2020

@droessmj I see, but the scaler marks itself as Active even when there is zero result. So my only concern is the change in isActive() function: return val > -1, nil. This should be on the first sight reverted back to return val > 0, nil. WDYT? As your change forces the scaler to be Active all the time, thus scaling 0<->1 doesn't work.

@droessmj
Copy link
Contributor

I believe you're correct. Reverting the isActive check while retaining the other part should resolve.

@zroubalik
Copy link
Member

@hmoravec are you able to retest the change please if I send you link to dev image later today?

@zroubalik
Copy link
Member

Would be great if anybody with Prometheus instance could check that the fix helped, just replace the images for KEDA Operator and KEDA Metrics Server. Thanks!
docker.io/zroubalik/keda:promFix
docker.io/zroubalik/keda-metrics-adapter:promFix

@hmoravec
Copy link
Author

hmoravec commented Apr 23, 2020

@zroubalik Sure, I'll test it. Btw automatic tests are planned? :-)

@tomkerkhove
Copy link
Member

We are always open for PRs ;)

But you are right, this shouldn't have slipped through. Sorry about this.

@hmoravec
Copy link
Author

hmoravec commented Apr 23, 2020

@zroubalik Working, thanks! It scaled down to zero when the metric became zero and scaled up when the metric became positive.

@zroubalik
Copy link
Member

@hmoravec great thanks. Yeah we do have some tests but it doesn't cover everything, so this is an area we definitely need to improve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working scaler-prometheus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants