Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

Doesn't work with multi-dimensions metrics #11

Closed
hposca opened this issue Sep 23, 2019 · 7 comments
Closed

Doesn't work with multi-dimensions metrics #11

hposca opened this issue Sep 23, 2019 · 7 comments

Comments

@hposca
Copy link

hposca commented Sep 23, 2019

Hi there,

On CloudWatch we had a metric named queuedepth with dimensions env, app and queue on the namespace Sidekiq. env described the environment like staging, production and development. app have the application name. And queue the queue from which this data came from. We have a Lambda that we use to gather the data and send it to CloudWatch.

If we try to use this metric as an ExternalMetric, as in the example below, it doesn't work.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "Sidekiq"
          metricName: "queuedepth"
          dimensions:
            - name: env
              value: staging
            - name: app
              value: appname
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
        selector:
          matchLabels:
            env: staging
            app: appname
            queue: queuename
      target:
        type: Value
        value: 40

If we kubectl logs -f the cloudwatch adapter pod we can see that it cannot find the metric :/

To make it work, we had to change our Lambda to create another metric (depth) with a single dimension (queue).

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "StagingSidekiq"
          metricName: "depth"
          dimensions:
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
      target:
        type: AverageValue
        averageValue: 40

And, as soon as we applied this new configuration, the metrics were fetched and the HPA began scaling immediately.

Is this expected? As we had dimensions in plural and accepting a list, we thought that we could use multi-dimension metrics. Also, we realized that in all the examples only single-dimension metrics are being used.

Our cluster is on EKS 1.14 and using chankh/k8s-cloudwatch-adapter:v0.6.0.

Thanks

@willianantunes
Copy link

willianantunes commented Oct 23, 2019

Just to endorse it, I get the same result here. My ExternalMetric:

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: aws-mq-propileu-destination-externalmetric
  namespace: production
spec:
  name: aws-mq-propileu-destination-externalmetric
  resource:
    resource: deployment
  queries:
    - id: "mq_1_propileu_destination_length"
      metricStat:
        metric:
          dimensions:
            - name: "Broker"
              value: "jsm-amq-prd2-1"
            - name: "Queue"
              value: "propileu-destination"
          metricName: "QueueSize"
          namespace: "AWS/AmazonMQ"
        period: 60
        stat: Sum
        unit: Count
      returnData: true
    - id: "mq_2_propileu_destination_length"
      metricStat:
        metric:
          dimensions:
            - name: "Broker"
              value: "jsm-amq-prd2-2"
            - name: "Queue"
              value: "propileu-destination"
          metricName: "QueueSize"
          namespace: "AWS/AmazonMQ"
        period: 60
        stat: Sum
        unit: Count
      returnData: true

And the HPA:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: propileu-pubsub-consumer-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: propileu-from-tower-consumer-deployment
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: External
      external:
        metricName: aws-mq-propileu-destination-externalmetric
        targetValue: 1

When I do kubectl -n custom-metrics logs -f --tail=100 k8s-cloudwatch-adapter-79cbf445b-vzslb, it outputs no error at all.

If I configured a setup with SQS, like the sample usage, it works properly.

@chankh
Copy link
Contributor

chankh commented Oct 24, 2019

Pull requests are welcomed.

@arunbhagyanath
Copy link
Contributor

@chankh Can you take look at the PR.

@rahulttn
Copy link

rahulttn commented Nov 25, 2019

@chankh Tested .7.0 Didn't work
*** Please dont mind the formating ,
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: rest-api-cpu
spec:
name: rest-api-cpu
resource:
resource: "deployment"
queries:
- id: respapicpu
metricStat:
metric:
namespace: "ContainerInsights"
metricName: "pod_cpu_utilization"
dimensions:
- name: "PodName"
value: "rest-api"
- name: "ClusterName"
value: "non-prod_eks"
- name: "Namespace"
value: "qa"
period: 10
stat: Sum
unit: Count
returnData: true

The HPA
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: rest-api-cpu
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: rest-api
minReplicas: 1
maxReplicas: 3
metrics:

  • type: External
    external:
    metricName: rest-api-cpu
    targetValue: 4

it shows this
rest-api-cpu Deployment/rest-api 0/4 1 3 1 14m

@arunbhagyanath
Copy link
Contributor

arunbhagyanath commented Nov 25, 2019

@rahulttn
While testing I see CloudWatch is not responding to the API calls and checking the metrics stats it was giving unit percent

Metrics Statistics API

aws cloudwatch get-metric-statistics --metric-name pod_cpu_utilization --start-time 09:40:00 --end-time 09:45:00  --period 300 --namespace ContainerInsights --statistics Sum --dimensions Name=PodName,Value=httpd Name=ClusterName,Value=eks Name=Namespace,Value=default

Can you try using the unit as "Percent" instead of "Count" (unit: Count) or remove it entirely (GetMetricStatistics API will be used to get the Unit).

Below are my test files

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: httpd-cpu
spec:
  name: httpd-cpu
  resource:
    resource: "deployment"
  queries:
    - id: httpdcpu
      metricStat:
        metric:
          namespace: "ContainerInsights"
          metricName: "pod_cpu_utilization"
          dimensions:
            - name: PodName
              value: "httpd"
            - name: ClusterName
              value: "eks"
            - name: Namespace
              value: "default"
        period: 10
        stat: Sum
        unit: Percent
      returnData: true
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: httpd-cpu
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: httpd
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: External
    external:
      metricName: httpd-cpu
      targetValue: 10

kubectl get hpa/httpd-cpu -w

NAME        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
httpd-cpu   Deployment/httpd   0/10      1         3         3          4m1s
httpd-cpu   Deployment/httpd   20/10     1         3         3          5m21s
httpd-cpu   Deployment/httpd   30/10     1         3         3          5m52s
httpd-cpu   Deployment/httpd   19/10     1         3         3          6m23s

@rahulttn
Copy link

@arunbhagyanath yeah, removing the unit makes it work, value seems to be percent. this works for cpu , memory etc but ContainerInsights provides pod based metrics like network tx or service based, those metrics would be useful as count imo . If the value could be get as count, that would be great.

@willianantunes
Copy link

Now it's working 100% as expected for me! Thank you the one who applied #14!

@chankh chankh closed this as completed Nov 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants