Skip to content

CloudWatch Metrics Firehose Method

Paul Reeves edited this page Dec 9, 2024 · 8 revisions

CloudWatch Metrics Firehose Method

CloudWatch Metrics Firehose Method Overview

AWS CloudWatch Metrics data contains useful information on metrics associated with services, such as CPU usage on EC2 instances, load balancer error counts, and S3 bucket size. This data can be easily ingested into Splunk for easy analysis and long-term storage. This data can be streamed directly from CloudWatch Metrics to Firehose, then sent onto Splunk from Firehose over the HTTP Event Collector (HEC). In the event that the Firehose cannot reach Splunk, events are sent to an S3 backsplash bucket for later retrieval. This data can be sent either as JSON-formatted events to a traditional event index, or as metrics to a metrics index.

Visual overview:

Mermaid code of visual overview:

graph TB;
	cwm[CloudWatch metrics]
	amazon[Amazon Data Firehose]
	splunk[Splunk]
	s3Backsplash[(Firehose Backsplash Bucket)]
	cwm-->amazon
	amazon-->|HEC|splunk
	amazon-->|On failure sending to Splunk|s3Backsplash

Splunk Dashboard

There is code for Splunk dashboards located in https://github.com/splunk/splunk-aws-gdi-toolkit/blob/main/CloudWatchmetrics-Firehose-Resources/splunkDashboard-metric.json and https://github.com/splunk/splunk-aws-gdi-toolkit/blob/main/CloudWatchmetrics-Firehose-Resources/splunkDashboard-event.json that can be used to check the health of the different events coming in through this method.

Deployment Instructions

These instructions are for configuring Amazon CloudWatch Metrics Data to be sent to Splunk. As part of the deployment, a CloudWatch Metrics Stream is created so that CloudWatch Metrics data is streamed to Splunk. This can be significantly cheaper than using a pull-based method where the CloudWatch API is called to retrieve this data.

  1. Install the Splunk Add-on for Amazon Web Services (AWS).
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the CloudWatchmetrics-Firehose-Resources/cwMetricsToSplunk.yml CloudFormation template in each region, in each account you want to collect CloudWatch Metrics data from. This may mean you need to deploy this CloudFormation template multiple times. This template will create the necessary resources to stream CloudWatch Metrics data to Firehose, transform the events to be formatted correctly, then send them onto Splunk. These parameters need to be changed from the default values:
    • splunkEventType: Select event if you want to send event-style messages to Splunk, or metric to send metric-style messages to Splunk.
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on CloudWatch Metrics data
    • splunkIndex: the index CloudWatch Metrics data will be sent to. If you selected event for the splunkEventType, this needs to be a traditional event index. If you selected metric for the splunkEventType, this needs to be a metric index.
    • splunkSource: the source field setting on CloudWatch Metrics data
  5. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:cloudwatch | head 100 for event-style data or |mpreview index={{splunkIndex}} sourcetype=aws:cloudwatch:metric | head 100 for metric-style data, where {{splunkIndex}} is the destination index selected in step 3.
  • Deploy cwMetricsToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name cwmetrics-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://cwMetricsToSplunk.yml --parameters ParameterKey=splunkEventType,ParameterValue=event ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef ParameterKey=splunkHost,ParameterValue=cwMetrics ParameterKey=splunkIndex,ParameterValue=aws  ParameterKey=splunkSource,ParameterValue=cwMetrics ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com ParameterKey=contact,ParameterValue=jsmith@contoso.com

Sample Events

  • Event-style data:
{"Average": 1.0, "Maximum": 1.0, "Minimum": 1.0, "SampleCount": 1.0, "Sum": 1.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "Invocations", "Namespace": "AWS/Lambda", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": ""}
{"Average": 39.0, "Maximum": 39.0, "Minimum": 39.0, "SampleCount": 1.0, "Sum": 39.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "IncomingRecords", "Namespace": "AWS/Firehose", "timestamp": "2022-08-16T17:51:00Z", "metric_dimensions": "DeliveryStreamName=[0123456789012-us-east-2-cloudtrail-firehose]"}
{"Average": 0.0, "Maximum": 0.0, "Minimum": 0.0, "SampleCount": 1.0, "Sum": 0.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "NumberOfMessagesDeleted", "Namespace": "AWS/SQS", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "QueueName=[0123456789012-us-east-2-cloudtrail-sqs-queue]"}
{"Average": 552.0, "Maximum": 552.0, "Minimum": 552.0, "SampleCount": 1.0, "Sum": 552.0, "Unit": "Bytes", "account_id": "0123456789012", "metric_name": "IncomingBytes", "Namespace": "AWS/Logs", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "LogGroupName=[/aws/lambda/0123456789012-us-east-2-cloudtrail-lambda-function]"}
{"Average": 1.0, "Maximum": 1.0, "Minimum": 1.0, "SampleCount": 1.0, "Sum": 1.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "IncomingPutRequests", "Namespace": "AWS/Firehose", "timestamp": "2022-08-16T17:51:00Z", "metric_dimensions": "DeliveryStreamName=[0123456789012-us-east-2-cloudtrail-firehose]"}
{"Average": 1.0, "Maximum": 1.0, "Minimum": 1.0, "SampleCount": 1.0, "Sum": 1.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "ConcurrentExecutions", "Namespace": "AWS/Lambda", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "FunctionName=[0123456789012-us-east-2-cwmetrics-metric-lambda-function],Resource=[0123456789012-us-east-2-cwmetrics-metric-lambda-function]"}
{"Average": 3.0, "Maximum": 3.0, "Minimum": 3.0, "SampleCount": 1.0, "Sum": 3.0, "Unit": "None", "account_id": "0123456789012", "metric_name": "IncomingLogEvents", "Namespace": "AWS/Logs", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "LogGroupName=[/aws/lambda/0123456789012-us-east-2-cwmetrics-metric-lambda-function]"}
{"Average": 0.0, "Maximum": 0.0, "Minimum": 0.0, "SampleCount": 64.0, "Sum": 0.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "KMSKeyAccessDenied", "Namespace": "AWS/Firehose", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "DeliveryStreamName=[0123456789012-us-east-2-cwmetrics-metric-firehose]"}
{"Average": 0.0, "Maximum": 0.0, "Minimum": 0.0, "SampleCount": 1.0, "Sum": 0.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "Errors", "Namespace": "AWS/Lambda", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": ""}
{"Average": 1.0, "Maximum": 2.0, "Minimum": 2.0, "SampleCount": 2.0, "Sum": 2.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "DeliveryToSplunk.Success", "Namespace": "AWS/Firehose", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": "DeliveryStreamName=[0123456789012-us-east-2-cwmetrics-metric-firehose]"}
{"Average": 1.0, "Maximum": 1.0, "Minimum": 1.0, "SampleCount": 1.0, "Sum": 1.0, "Unit": "Count", "account_id": "0123456789012", "metric_name": "ConcurrentExecutions", "Namespace": "AWS/Lambda", "timestamp": "2022-08-16T17:52:00Z", "metric_dimensions": ""}
  • Metric-style data:
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-event-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":95,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":98,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":92,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":190}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-event-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":88.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":97,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":80,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":177}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-event-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":111.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":120,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":103,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":223}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-event-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":80.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":83,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":78,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":161}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-event-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":98.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":105,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":92,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":197}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-metric-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":94.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":97,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":92,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":189}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-metric-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":89.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":99,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":80,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":179}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-metric-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":113.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":121,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":106,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":227}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-metric-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":82,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":83,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":81,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":164}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","MetricStreamName":"0123456789012-us-west-1-cwmetrics-metric-stream","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":99,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":102,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":96,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":2,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":198}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":94.75,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":98,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":92,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":4,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":379}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":112.5,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":121,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":103,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":4,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":450}
{"AccountID":"0123456789012","MetricName":"MetricUpdate","Namespace":"AWS/CloudWatch/MetricStreams","Region":"us-west-1","Unit":"None","metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Average":84.25,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Maximum":91,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Minimum":78,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.SampleCount":4,"metric_name:AWS/CloudWatch/MetricStreams.MetricUpdate.Sum":337}

FAQ

  • What namespaces are sent to Splunk? By default, every metric in every namespace is sent to Splunk.
  • How can I customize the data being sent to Splunk? To customize the data being sent to Splunk, you'll want to modify the resource named cwMetricsStream in the CloudFormation template to define either ExcludeFilters or IncludeFilters. We recommend referencing the AWS documentation for assistance with defining these.
  • How can I see what resources are being deployed? You can see the resources that are going to be deployed in the CloudFormation template.
  • Do I have to create 1 HEC token per sourcetype + index combination, or can I use the same HEC token for multiple sourcetype + index combinations? You can re-use an existing HEC token as long as the HEC token is configured to send to all of the indexes you want to send data to using that HEC token.
  • Why is the CAPABILITY_NAMED_IAM capability required to deploy cwMetricsToSplunk.yml? There are IAM roles and permissions with custom names defined. The IAM roles/permissions are required to grant resources access to other resources (eg Lambda to be able to pull the log files from the S3 bucket), and custom names were set for uniform naming across all the resources deployed in the stack.
  • I found a typo. What should I do? Feel free to message me on the community Slack or submit a PR. I'm really bad at typos. Thank you!
  • How can I see statistics (event latency, license usage, event count, etc) about data coming in through this method of sending data to Splunk? There is code for dashboards located that you can use that reports on this type of information.
  • What if I want to send data to Edge Processor? The recommended way to get the data to Edge Processor is to set up the architecture defined in the EP SVA, with Firehose acting as the HEC source. You will need to configure a signed certificate on the load balancer since Amazon Data Firehose requires the destination it's sending to to have a signed certificate and appropriate DNS entry. The DNS Round Robin architecture could also be used.
  • What if I want to send data to Ingest Processor? To get data to Ingest Processor, first send it to the Splunk Cloud environment like normal, then when creating the pipeline to interact with the data specify a partition that will apply the data being sent from the Splunk AWS GDI Toolkit. For more information on how to do this, refer to the Ingest Processor documentation.
  • I have another question. What should I do? Feel free to message me on the community Slack! I'd love to help.

Troubleshooting

  • The CloudFormation template fails to deploy when I try to deploy it..
    • Verify that the role you are using to deploy the CloudFormation template has the appropriate permissions to deploy the resources you're trying to deploy.
    • Verify that the parameters are set correctly on the stack.
    • Also, in the CloudFormation console, check the events on the failed stack for hints to where it failed.
    • Also see the official Troubleshooting CloudFormation documentation.
  • Events aren't getting to Splunk. What should I check? In this order, check the following:
    1. That the CloudFormation template deployed without error.
    2. That the parameters (especially splunkHECEndpoint and splunkHECToken) are correct.
    3. That the log files are being put into the S3 bucket. The easiest way to do this is just to check the bucket through the AWS Console to see if objects (files) are being put (copied) to it.
    4. That the Lambda transformation function is executing by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, then the "Metrics" sub-tab and looking at the "Invocations" pane.
    5. That the Lambda function is executing successfully by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, then the "Metrics" sub-tab and looking at the "Error count and success rate (%)" pane.
    6. That the Lambda function isn't producing errors by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, clicking "View logs in CloudWatch", then checking the events in the Log streams.
    7. That the Amazon Data Firehose is receiving records by going to the Firehose delivery stream in the AWS Console, clicking the "Monitoring" tab if it's not selected, and viewing the "Incoming records" pane.
    8. That the Amazon Data Firehose is sending records to Splunk by going to the Firehose delivery stream in the AWS Console, clicking the "Monitoring" tab if it's not selected, and viewing the "Delivery to Splunk success" pane. You can also view the "Destination error logs" pane on that same page.
    9. That there are no errors related to ingestion in Splunk.
    10. That any firewall ports are open from Firehose to Splunk.
  • Not all of the events are getting to Splunk or Amazon Data Firehose is being throttled.
    • Amazon Data Firehose has a number of quotas associated with each Firehose. You can check whether you're being throttled by navigating to the monitoring tab in the AWS console for the Firehose and checking if the "Throttled records (Count)" value is greater than zero. If the Firehose is being throttled, you can use the Kinesis Firehose Service quota increase form to request that quota be increased.