Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

Closed
bhgames opened this issue Nov 14, 2017 · 8 comments
Closed

CloudWatch Alarm ignores Threshold when comparison op is LessThan* #2281

bhgames opened this issue Nov 14, 2017 · 8 comments
Labels
bug Addresses a defect in current functionality. service/cloudwatch Issues and PRs that pertain to the cloudwatch service.

Comments

@bhgames
Copy link

bhgames commented Nov 14, 2017

If you try to create a cloudwatch alarm that steps down on say, CPU Reservation for an ECS cluster, it creates something like this in AWS:

https://screencast.com/t/iNySsTAjcG

As compared to the GreaterThan version:

https://screencast.com/t/BZrmNmV4yQkB

Terraform Version

10.8.0

Affected Resource(s)

Please list the resources as a list, for example:

  • aws_cloudwatch_metric_alarm

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

variable "name" {
  description = "The name of the ECS Cluster."
}


variable "min_size" {
  description = "The min number of EC2 Instances to run in the ECS Cluster."
  default=1
}

variable "max_size" {
  description = "The max number of EC2 Instances to run in the ECS Cluster."
  default=2
}


variable "instance_type" {
  description = "The type of EC2 Instance to deploy in the ECS Cluster (e.g. t2.micro)."
}

variable "vpc_id" {
  description = "The ID of the VPC in which to deploy the ECS Cluster."
}

variable "subnet_ids" {
  description = "The subnet IDs in which to deploy the EC2 Instances of the ECS Cluster."
}



variable "key_pair_name" {
  description = "The name of an EC2 Key Pair to associate with each EC2 Instance in the ECS Cluster. Leave blank to not associate a Key Pair."
  default = "amdirent-aws"
}

variable "allow_ssh_from_cidr_blocks" {
  description = "The list of CIDR-formatted IP address ranges from which the EC2 Instances in the ECS Cluster should accept SSH connections."
  type = "list"
  default = ["0.0.0.0/0"]
}
# ---------------------------------------------------------------------------------------------------------------------
# CREATE AN ECS CLUSTER
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_ecs_cluster" "example_cluster" {
  name = "${var.name}"
}

# ---------------------------------------------------------------------------------------------------------------------
# DEPLOY AN AUTO SCALING GROUP (ASG)
# Each EC2 Instance in the ASG will register as an ECS Cluster Instance.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_autoscaling_policy" "high_cpu_policy" {
  name                   = "${var.name}-high-cpu-asg-policy"
  adjustment_type        = "ChangeInCapacity"
  autoscaling_group_name = "${aws_autoscaling_group.ecs_cluster_instances.name}"
  policy_type="StepScaling"

  step_adjustment {
    scaling_adjustment = -1
    metric_interval_lower_bound = 0
  }
}



resource "aws_cloudwatch_metric_alarm" "unused-cpu-trigger" {
  alarm_name          = "${var.name}-too-much-cpu-asg-alarm"
  comparison_operator = "LessThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUReservation"
  namespace           = "AWS/ECS"
  period              = "60"
  statistic           = "Average"
  threshold           = "60"

  dimensions {
    ClusterName="${var.name}"
  }

  alarm_description = "This metric monitors too much CPU for new service tasks"
  alarm_actions     = ["${aws_autoscaling_policy.high_cpu_policy.arn}"]
}

resource "aws_autoscaling_group" "ecs_cluster_instances" {
  name = "${var.name}-ecs-asg"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  launch_configuration = "${aws_launch_configuration.ecs_instance.name}"
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]

  tag {
    key = "Name"
    value = "${var.name}"
    propagate_at_launch = true
  }

  enabled_metrics = ["GroupMinSize", "GroupMaxSize", "GroupDesiredCapacity", "GroupInServiceInstances", "GroupPendingInstances", "GroupTerminatingInstances", "GroupStandbyInstances", "GroupTotalInstances"]
}

# Fetch the AWS ECS Optimized Linux AMI. Note that if you've never launched this AMI before, you have to accept the
# terms and conditions on this webpage or the EC2 instances will fail to launch:
# https://aws.amazon.com/marketplace/pp/B00U6QTYI2
data "aws_ami" "ecs" {
  most_recent = true
  owners = ["amazon"]
  filter {
    name = "name"
    values = ["amzn-ami-*-amazon-ecs-optimized"]
  }
}


resource "aws_launch_configuration" "ecs_instance" {
  name_prefix = "${var.name}-ec2"
  instance_type = "${var.instance_type}"
  key_name = "${var.key_pair_name}"
  iam_instance_profile = "${aws_iam_instance_profile.ecs_instance.name}"
  security_groups = ["${aws_security_group.ecs_instance.id}"]
  image_id = "${data.aws_ami.ecs.id}"
  associate_public_ip_address = false
  enable_monitoring = true

  # A shell script that will execute when on each EC2 instance when it first boots to configure the ECS Agent to talk
  # to the right ECS cluster
  user_data = "${data.template_file.user_data.rendered}"

  # https://terraform.io/docs/configuration/resources.html
  lifecycle {
    create_before_destroy = true
  }
}

data "template_file" "user_data" {
  template = <<EOF
#!/bin/bash
echo "ECS_CLUSTER=${var.name}" >> /etc/ecs/ecs.config
EOF
}

# ---------------------------------------------------------------------------------------------------------------------
# CREATE AN IAM ROLE FOR EACH INSTANCE IN THE CLUSTER
# We export the IAM role ID as an output variable so users of this module can attach custom policies.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_iam_role" "ecs_instance" {
  name = "${var.name}_ecs_instance_role"
  assume_role_policy = "${data.aws_iam_policy_document.ecs_instance.json}"

  # aws_iam_instance_profile.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}

data "aws_iam_policy_document" "ecs_instance" {
  statement {
    effect = "Allow"
    actions = ["sts:AssumeRole"]
    principals {
      type = "Service"
      identifiers = ["ec2.amazonaws.com", "application-autoscaling.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy" "ecr_pull" {
  name = "${var.name}-ecr-reader-for-ecs-instance-policy"
  role     = "${aws_iam_role.ecs_instance.id}"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "ecr:GetAuthorizationToken",
      "ecr:BatchCheckLayerAvailability",
      "ecr:GetDownloadUrlForLayer",
      "ecr:GetRepositoryPolicy",
      "ecr:DescribeRepositories",
      "ecr:ListImages",
      "ecr:DescribeImages",
      "ecr:BatchGetImage"
    ],
    "Resource": "*"
  }]
}
EOF
}

# To attach an IAM Role to an EC2 Instance, you use an IAM Instance Profile
resource "aws_iam_instance_profile" "ecs_instance" {
  name = "${var.name}"
  role = "${aws_iam_role.ecs_instance.name}"

  # aws_launch_configuration.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}


# ---------------------------------------------------------------------------------------------------------------------
# ATTACH IAM POLICIES TO THE IAM ROLE
# The IAM policy allows an ECS Agent running on each EC2 Instance to communicate with the ECS scheduler.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_iam_role_policy" "ecs_cluster_permissions" {
  name = "${var.name}-ecs-cluster-permissions"
  role = "${aws_iam_role.ecs_instance.id}"
  policy = "${data.aws_iam_policy_document.ecs_cluster_permissions.json}"
}

data "aws_iam_policy_document" "ecs_cluster_permissions" {
  statement {
    effect = "Allow"
    resources = ["*"]
    actions = [
      "ecs:CreateCluster",
      "ecs:DeregisterContainerInstance",
      "ecs:DiscoverPollEndpoint",
      "ecs:Poll",
      "ecs:RegisterContainerInstance",
      "ecs:StartTelemetrySession",
      "ecs:Submit*",
      "ecs:UpdateService",
      "cloudwatch:DescribeAlarms",
      "ecs:DescribeServices"
    ]
  }
}

# ---------------------------------------------------------------------------------------------------------------------
# CREATE A SECURITY GROUP THAT CONTROLS WHAT TRAFFIC CAN GO IN AND OUT OF THE CLUSTER
# We export the ID of the group as an output variable so users of this module can attach custom rules.
# ---------------------------------------------------------------------------------------------------------------------

resource "aws_security_group" "ecs_instance" {
  name = "${var.name} ECS Cluster"
  description = "Security group for the EC2 instances in the ECS cluster ${var.name}"
  vpc_id = "${var.vpc_id}"

  # aws_launch_configuration.ecs_instance sets create_before_destroy to true, which means every resource it depends on,
  # including this one, must also set the create_before_destroy flag to true, or you'll get a cyclic dependency error.
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group_rule" "all_outbound_all" {
  type = "egress"
  from_port = 0
  to_port = 0
  protocol = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = "${aws_security_group.ecs_instance.id}"
}

resource "aws_security_group_rule" "all_inbound_all" {
  type = "ingress"
  from_port = 0
  to_port = 0
  protocol = "-1"
  cidr_blocks = ["0.0.0.0/0"]
  security_group_id = "${aws_security_group.ecs_instance.id}"
}

Debug Output

https://gist.github.com/bhgames/ca2fd82dac7ef64d08c37a1451689d53

Panic Output

Nope

Expected Behavior

Should have set "60" as the proper top threshold so I had 60 >= CPU Reservation >= -Infinity instead of the weird >= CPU Res >= -Infinity which doesnt work. Have to go in to edit by hand.

Actual Behavior

Puts in >= CPU Res >= -Infinity which doesnt work. Have to go in to edit by hand to add the threshold.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

References

Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:

@paddycarver paddycarver added the bug Addresses a defect in current functionality. label Nov 21, 2017
@toddlucas
Copy link

Hi @bhgames, I just came across your issue looking for related information. This may not be an issue with the provider, but rather a side-effect of the way step scaling works (it's not intuitive). When you use a scale-in policy, and you have a step with null, it should be a null lower bound (-inf), which means you need to switch your metric_interval_lower_bound to metric_interval_upper_bound. This discrepancy may account for the strange presentation in the console.

The Simple and Step Scaling Policies documentation has an example, and the StepAdjustment API reference has some additional details.

@toddlucas
Copy link

This issue can probably be closed.

I've confirmed that this is an issue related to the bounds, as mentioned above, and how they interact with the AWS console UI. The UI will show a step with +infinity or one with -infinity, but not both, depending on the associated alarm settings.

The AWS console will show steps including one with +infinity when the associated alarm is set to a of comparison_operator of GreaterThanOrEqualToThreshold. In this scenario, the policy expects one step to have a lower bound only (and an implict upper bound of infinity).

If you switch the operator to be LessThanOrEqualToThreshold, the steps will include one with -infinity, assuming you have a step with an upper bound only and an implicit lower bound of infinity. It appears that the UI was designed around having two policies--one for scale out and one for scale in--and an associated pair of alarms.

Although it's possible to use two policies with one alarm (as is shown in the Simple and Step Scaling Policies linked to above), doing so will result in the strange UI observed by @bhgames .

@bflad bflad added the service/cloudwatch Issues and PRs that pertain to the cloudwatch service. label Jan 18, 2018
@bflad
Copy link
Contributor

bflad commented Jan 18, 2018

@toddlucas / @bhgames do you think any documentation improvements could be made here?

@toddlucas
Copy link

Hi @bflad, unless @bhgames has some ideas I think doing so would require too much explanation and would muddy the docs. This is really an AWS under-documentation issue WRT guidance on how to combine alarms and policies. They allow a few different approaches but it seems to work more seamlessly with two separate alarms and two separate policies.

@bflad
Copy link
Contributor

bflad commented Oct 30, 2018

Closing this old issue. 👍

@bflad bflad closed this as completed Oct 30, 2018
@bhgames
Copy link
Author

bhgames commented Oct 30, 2018 via email

@nicholasserra
Copy link

For those googling, swapping around metric_interval_lower_bound and metric_interval_upper_bound as stated above did indeed fix this issue. Just ran across the same thing.

@ghost
Copy link

ghost commented Apr 1, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/cloudwatch Issues and PRs that pertain to the cloudwatch service.
Projects
None yet
Development

No branches or pull requests

5 participants