This Terraform module implements a serverless observability stack which can optionally create CloudWatch alarms and forwards EventBridge events to an SQS queue.
This module works in conjuction with the Terraform AWS Observability Receiver module.
The file contains the alarms per service. In the example below you see the EC2 service that contains the CPU Utilization alarm. This will create the CPU Utilization alarm for every EC2 instance.
"EC2" : { <- Service
"CPUUtilization": { <- Alarmname
"AlarmThresholds" : {
"priority": ["P1", "P2", "P3"], <- for every priority there needs to be a threshold and vice versa
"alarm_threshold": ["90", "80", "75"]
},
"ComparisonOperator" : "GreaterThanThreshold",
"Description" : { <- Description is used for naming the alarm in cloudwatch
"Operatorsymbol" : ">",
"ThresholdUnit" : "%"
},
"EvaluationPeriods" : 2,
"MetricName" : "CPUUtilization",
"Namespace" : "AWS/EC2",
"Period" : 300,
"Statistic" : "Average",
"TreatMissingData" : "breaching",
"Dimensions" : "InstanceId"
}
},
There is chance when applying the module you might run into the following error;
This error is the AWS API not being able to handle all the requests at once. You can run do one of the following if this occurs:
- Rerun terraform apply once more and the module should complete the creation of the rest of the resources.
- Run terraform apply with the following flag
-parallelism=n
.
module "observability_sender" {
source = "git@github.com:TechNative-B-V/terraform-aws-observability-sender.git?ref=v0.0.1"
monitoring_account_configuration = {
sqs_name = string
sqs_region = string
sqs_account = number
}
sqs_dlq_arn = string
kms_key_arn = string
sns_notification_receiver_topic_arn = string
eventbridge_rules = {
"aws-backup-notification-rule" : {
"description" : "Monitor state changes of aws backup service.",
"enabled" : true,
"event_pattern" : jsonencode({
"source" : ["aws.backup"],
"detail-type" : ["Backup Job State Change"]
})
}
}
}
At first run you might end up with a put exceeded error where you are trying to create too many alarms at once.
You need to rerun the Lambda alarm creator a few times maybe with a shorter list. This allows you to not reach the maximum threshold set by AWS.
You need to also clean up the SQS queue in the observablity hub account as the error might hang in the SQS queue even though the problem is resolved.
Name | Version |
---|---|
archive | n/a |
aws | > 4.3.0 |
Name | Source | Version |
---|---|---|
iam_role_lambda_cw_alarm_creator | git@github.com:TechNative-B-V/modules-aws.git//identity_and_access_management/iam_role | v1.1.7 |
iam_role_lambda_payload_forwarder | git@github.com:TechNative-B-V/modules-aws.git//identity_and_access_management/iam_role | v1.1.7 |
lambda_cw_alarm_creator | git@github.com:wearetechnative/terraform-aws-lambda.git | 13eda5f9e8ae40e51f66a45837cd41a6b35af988 |
lambda_payload_forwarder | git@github.com:TechNative-B-V/modules-aws.git//lambda | v1.1.7 |
Name | Description | Type | Default | Required |
---|---|---|---|---|
eventbridge_rules | EventBridge rule settings. | map(object({ |
{} |
no |
kms_key_arn | ARN of the KMS key. | string |
n/a | yes |
lambda_timeout | Lambda function timeout. | number |
60 |
no |
monitoring_account_configuration | Configuration settings of the monitoring account. | object({ |
n/a | yes |
source_directory_location | Source Directory location for the custom alarm creator actions.py. | string |
null |
no |
sqs_dlq_arn | ARN of the Dead Letter Queue. | string |
n/a | yes |
Name | Description |
---|---|
lambda_cloudwatch_alarm_creator_arn | n/a |
lambda_cloudwatch_alarm_creator_name | n/a |
lambda_payload_forwarder_arn | n/a |
lambda_payload_forwarder_name | n/a |
sns_topic_arn | n/a |
sns_topic_id | n/a |