CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

bhgames · 2017-11-14T13:43:16Z

If you try to create a cloudwatch alarm that steps down on say, CPU Reservation for an ECS cluster, it creates something like this in AWS:

https://screencast.com/t/iNySsTAjcG

As compared to the GreaterThan version:

https://screencast.com/t/BZrmNmV4yQkB

Terraform Version

10.8.0

Affected Resource(s)

Please list the resources as a list, for example:

aws_cloudwatch_metric_alarm

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

variable "name" {
  description = "The name of the ECS Cluster."
}


variable "min_size" {
  description = "The min number of EC2 Instances to run in the ECS Cluster."
  default=1
}

variable "max_size" {
  description = "The max number of EC2 Instances to run in the ECS Cluster."
  default=2
}


variable "instance_type" {
  description = "The type of EC2 Instance to deploy in the ECS Cluster (e.g. t2.micro)."
}

variable "vpc_id" {
  description = "The ID of the VPC in which to deploy the ECS Cluster."
}

variable "subnet_ids" {
  description = "The subnet IDs in which to deploy the EC2 Instances of the ECS Cluster."
}



variable "key_pair_name" {
  description = "The name of an EC2 Key Pair to associate with each EC2 Instance in the ECS Cluster. Leave blank to not associate a Key Pair."
  default = "amdirent-aws"
}

variable "allow_ssh_from_cidr_blocks" {
  description = "The list of CIDR-formatted IP address ranges from which the EC2 Instances in the ECS Cluster should accept SSH connections."
  type = "list"
  default = ["0.0.0.0/0"]
}
# ---------------------------------------------------------------------------------------------------------------------
# CREATE AN ECS CLUSTER
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_ecs_cluster" "example_cluster" {
  name = "${var.name}"
}

# ---------------------------------------------------------------------------------------------------------------------
# DEPLOY AN AUTO SCALING GROUP (ASG)
# Each EC2 Instance in the ASG will register as an ECS Cluster Instance.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_autoscaling_policy" "high_cpu_policy" {
  name                   = "${var.name}-high-cpu-asg-policy"
  adjustment_type        = "ChangeInCapacity"
  autoscaling_group_name = "${aws_autoscaling_group.ecs_cluster_instances.name}"
  policy_type="StepScaling"

  step_adjustment {
    scaling_adjustment = -1
    metric_interval_lower_bound = 0
  }
}



resource "aws_cloudwatch_metric_alarm" "unused-cpu-trigger" {
  alarm_name          = "${var.name}-too-much-cpu-asg-alarm"
  comparison_operator = "LessThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUReservation"
  namespace           = "AWS/ECS"
  period              = "60"
  statistic           = "Average"
  threshold           = "60"

  dimensions {
    ClusterName="${var.name}"
  }

  alarm_description = "This metric monitors too much CPU for new service tasks"
  alarm_actions     = ["${aws_autoscaling_policy.high_cpu_policy.arn}"]
}

resource "aws_autoscaling_group" "ecs_cluster_instances" {
  name = "${var.name}-ecs-asg"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  launch_configuration = "${aws_launch_configuration.ecs_instance.name}"
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]

  tag {
    key = "Name"
    value = "${var.name}"
    propagate_at_launch = true
  }

  enabled_metrics = ["GroupMinSize", "GroupMaxSize", "GroupDesiredCapacity", "GroupInServiceInstances", "GroupPendingInstances", "GroupTerminatingInstances", "GroupStandbyInstances", "GroupTotalInstances"]
}

# Fetch the AWS ECS Optimized Linux AMI. Note that if you've never launched this AMI before, you have to accept the
# terms and conditions on this webpage or the EC2 instances will fail to launch:
# https://aws.amazon.com/marketplace/pp/B00U6QTYI2
data "aws_ami" "ecs" {
  most_recent = true
  owners = ["amazon"]
  filter {
    name = "name"
    values = ["amzn-ami-*-amazon-ecs-optimized"]
  }
}


resource "aws_launch_configuration" "ecs_instance" {
  name_prefix = "${var.name}-ec2"
  instance_type = "${var.instance_type}"
  key_name = "${var.key_pair_name}"
  iam_instance_profile = "${aws_iam_instance_profile.ecs_instance.name}"
  security_groups = ["${aws_security_group.ecs_instance.id}"]
  image_id = "${data.aws_ami.ecs.id}"
  associate_public_ip_address = false
  enable_monitoring = true

  # A shell script that will execute when on each EC2 instance when it first boots to configure the ECS Agent to talk
  # to the right ECS cluster
  user_data = "${data.template_file.user_data.rendered}"

  # https://terraform.io/docs/configuration/resources.html
  lifecycle {
    create_before_destroy = true
  }
}

data "template_file" "user_data" {
  template = <<EOF
#!/bin/bash
echo "ECS_CLUSTER=${var.name}" >> /etc/ecs/ecs.config
EOF
}

# ---------------------------------------------------------------------------------------------------------------------
# CREATE AN IAM ROLE FOR EACH INSTANCE IN THE CLUSTER
# We export the IAM role ID as an output variable so users of this module can attach custom policies.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_iam_role" "ecs_instance" {
  name = "${var.name}_ecs_instance_role"
  assume_role_policy = "${data.aws_iam_policy_document.ecs_instance.json}"

  # aws_iam_instance_profile.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}

data "aws_iam_policy_document" "ecs_instance" {
  statement {
    effect = "Allow"
    actions = ["sts:AssumeRole"]
    principals {
      type = "Service"
      identifiers = ["ec2.amazonaws.com", "application-autoscaling.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy" "ecr_pull" {
  name = "${var.name}-ecr-reader-for-ecs-instance-policy"
  role     = "${aws_iam_role.ecs_instance.id}"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:GetRepositoryPolicy",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage"
    ],
    "Resource": "*"
  }]
}
EOF
}

# To attach an IAM Role to an EC2 Instance, you use an IAM Instance Profile
resource "aws_iam_instance_profile" "ecs_instance" {
  name = "${var.name}"
  role = "${aws_iam_role.ecs_instance.name}"

  # aws_launch_configuration.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}


# ---------------------------------------------------------------------------------------------------------------------
# ATTACH IAM POLICIES TO THE IAM ROLE
# The IAM policy allows an ECS Agent running on each EC2 Instance to communicate with the ECS scheduler.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_iam_role_policy" "ecs_cluster_permissions" {
  name = "${var.name}-ecs-cluster-permissions"
  role = "${aws_iam_role.ecs_instance.id}"
  policy = "${data.aws_iam_policy_document.ecs_cluster_permissions.json}"
}

data "aws_iam_policy_document" "ecs_cluster_permissions" {
  statement {
    effect = "Allow"
    resources = ["*"]
    actions = [
      "ecs:CreateCluster",
      "ecs:DeregisterContainerInstance",
      "ecs:DiscoverPollEndpoint",
      "ecs:Poll",
      "ecs:RegisterContainerInstance",
      "ecs:StartTelemetrySession",
      "ecs:Submit*",
      "ecs:UpdateService",
      "cloudwatch:DescribeAlarms",
      "ecs:DescribeServices"
    ]
  }
}

# ---------------------------------------------------------------------------------------------------------------------
# CREATE A SECURITY GROUP THAT CONTROLS WHAT TRAFFIC CAN GO IN AND OUT OF THE CLUSTER
# We export the ID of the group as an output variable so users of this module can attach custom rules.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_security_group" "ecs_instance" {
  name = "${var.name} ECS Cluster"
  description = "Security group for the EC2 instances in the ECS cluster ${var.name}"
  vpc_id = "${var.vpc_id}"

  # aws_launch_configuration.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group_rule" "all_outbound_all" {
  type = "egress"
  from_port = 0
  to_port = 0
  protocol = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = "${aws_security_group.ecs_instance.id}"
}

resource "aws_security_group_rule" "all_inbound_all" {
  type = "ingress"
  from_port = 0
  to_port = 0
  protocol = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = "${aws_security_group.ecs_instance.id}"
}

Debug Output

https://gist.github.com/bhgames/ca2fd82dac7ef64d08c37a1451689d53

Panic Output

Nope

Expected Behavior

Should have set "60" as the proper top threshold so I had 60 >= CPU Reservation >= -Infinity instead of the weird >= CPU Res >= -Infinity which doesnt work. Have to go in to edit by hand.

Actual Behavior

Puts in >= CPU Res >= -Infinity which doesnt work. Have to go in to edit by hand to add the threshold.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

References

Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:

Make aws_iam_policy_attachment warning less opaque #1234

The text was updated successfully, but these errors were encountered:

toddlucas · 2018-01-07T22:18:01Z

Hi @bhgames, I just came across your issue looking for related information. This may not be an issue with the provider, but rather a side-effect of the way step scaling works (it's not intuitive). When you use a scale-in policy, and you have a step with null, it should be a null lower bound (-inf), which means you need to switch your metric_interval_lower_bound to metric_interval_upper_bound. This discrepancy may account for the strange presentation in the console.

The Simple and Step Scaling Policies documentation has an example, and the StepAdjustment API reference has some additional details.

toddlucas · 2018-01-14T04:30:24Z

This issue can probably be closed.

I've confirmed that this is an issue related to the bounds, as mentioned above, and how they interact with the AWS console UI. The UI will show a step with +infinity or one with -infinity, but not both, depending on the associated alarm settings.

The AWS console will show steps including one with +infinity when the associated alarm is set to a of comparison_operator of GreaterThanOrEqualToThreshold. In this scenario, the policy expects one step to have a lower bound only (and an implict upper bound of infinity).

If you switch the operator to be LessThanOrEqualToThreshold, the steps will include one with -infinity, assuming you have a step with an upper bound only and an implicit lower bound of infinity. It appears that the UI was designed around having two policies--one for scale out and one for scale in--and an associated pair of alarms.

Although it's possible to use two policies with one alarm (as is shown in the Simple and Step Scaling Policies linked to above), doing so will result in the strange UI observed by @bhgames .

bflad · 2018-01-18T14:38:24Z

@toddlucas / @bhgames do you think any documentation improvements could be made here?

toddlucas · 2018-01-19T00:11:20Z

Hi @bflad, unless @bhgames has some ideas I think doing so would require too much explanation and would muddy the docs. This is really an AWS under-documentation issue WRT guidance on how to combine alarms and policies. They allow a few different approaches but it seems to work more seamlessly with two separate alarms and two separate policies.

bflad · 2018-10-30T19:48:04Z

Closing this old issue. 👍

bhgames · 2018-10-30T19:49:57Z

Hey yeah sorry I never got the github notifications here. I haven't worked with the repository that exhibited this behavior in over a year now and no longer have access to it as it is owned privately. I would say your suggestion is fine.

…

On Tue, Oct 30, 2018 at 2:48 PM Brian Flad ***@***.***> wrote: Closed #2281 <#2281> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATQawqeKpR4xdzaswapgVU76XJwxw7lks5uqKz9gaJpZM4QdW5g> .

-- *Jordan Prince* | Software Engineering Caradvise.com | *O:* *301.751.7354* | *C:* *301.751.7354* This e-mail is only intended for the person(s) to whom it is addressed and may contain confidential information. Any unauthorized review, use, disclosure, or distribution is prohibited. If you received this e-mail in error, please notify the sender by reply e-mail and then delete this message and any attachments from your system. Thank you for your cooperation.

nicholasserra · 2019-01-22T02:55:27Z

For those googling, swapping around metric_interval_lower_bound and metric_interval_upper_bound as stated above did indeed fix this issue. Just ran across the same thing.

ghost · 2020-04-01T17:24:02Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

paddycarver added the bug Addresses a defect in current functionality. label Nov 21, 2017

bflad added the service/cloudwatch Issues and PRs that pertain to the cloudwatch service. label Jan 18, 2018

bflad closed this as completed Oct 30, 2018

ghost locked and limited conversation to collaborators Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

bhgames commented Nov 14, 2017

toddlucas commented Jan 7, 2018

toddlucas commented Jan 14, 2018

bflad commented Jan 18, 2018

toddlucas commented Jan 19, 2018

bflad commented Oct 30, 2018

bhgames commented Oct 30, 2018 via email

nicholasserra commented Jan 22, 2019

ghost commented Apr 1, 2020

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

Comments

bhgames commented Nov 14, 2017

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

toddlucas commented Jan 7, 2018

toddlucas commented Jan 14, 2018

bflad commented Jan 18, 2018

toddlucas commented Jan 19, 2018

bflad commented Oct 30, 2018

bhgames commented Oct 30, 2018 via email

nicholasserra commented Jan 22, 2019

ghost commented Apr 1, 2020