Skip to content

S3 SQS Lambda Firehose Method

Paul Reeves edited this page Dec 9, 2024 · 34 revisions

S3 SQS Lambda Firehose Method

S3-SQS-Lambda-Firehose Method Overview

Many services can send logs to S3 buckets, including internal AWS services such as CloudTrail. Using this method of taking objects (files) with events from S3, sending object metadata to SQS, using a Lambda function to retrieve the object (file) in S3, sending the events to Firehose, then sending them to Splunk can be a cost-effective, scalable, serverless, event-driven way to send events to Splunk.

Visual overview:

Mermaid code of visual overview:

graph TD;
	s3Bucket[(S3 Bucket)]
    s3Backsplash[(Firehose Backsplash Bucket)]
	sqs[SQS Queue]
	lambda[Lambda Function]
	adf[Amazon Data Firehose]
	s3Bucket-->|Message on object Put|sqs
	sqs-->|Lambda retrieves SQS messages|lambda
	lambda-->|Lambda deletes SQS messages when processed|sqs
	s3Bucket-->|Lambda retireves files|lambda
	lambda-->|PutRecordBatch|adf
	splunk[Splunk]
	logSource-->|S3 Put|s3Bucket
	adf-->|HEC|splunk
    adf-->|On failure sending to Splunk|s3Backsplash

S3-SQS-Lambda-Firehose Method Detailed steps:

Detailed understanding of how this method works isn't necessary to implement it.

  1. Object is placed into log S3 bucket
  2. SQS message is sent SQS queue from S3 with object (file) metadata
  3. Lambda function polls for the SQS messages
  4. Lambda function parses the SQS messages and for each message (each object placed into the S3 bucket): 4a. Retrieves the object metadata (bucket name and key) 4b. Validates the file type is supported by the function 4c. Downloads the object (file) from the S3 bucket 4d. Uncompresses the downloaded file if it is compressed 4e. Reads the file contents into memory 4f. Breaks up the file contents into separate events 4g. For each event: 4g-1. Gets the timestamp of the event 4g-2. Constructs the record to send to Firehose from the event and timestamp 4g-3. Sends the events to Firehose 4h. Deletes the file with the events

The source code for the Lambda function can be found here: https://github.com/splunk/splunk-aws-gdi-toolkit/blob/main/S3-SQS-Lambda-Firehose-Resources/lambda.py .

Splunk Dashboard

There is code for a Splunk Dashboard Studio dashboard located in https://github.com/splunk/splunk-aws-gdi-toolkit/blob/main/S3-SQS-Lambda-Firehose-Resources/splunkDashboard.json that can be used to check the health of the different events coming in through this method.

Deployment Instructions

Instructions to deploy this ingestion method depend on what type of data (data sourcetype essentially) you are looking to send to Splunk.

CloudTrail Deployment Instructions

These instructions are for configuring AWS CloudTrail logs to be sent to Splunk. CloudTrail is a service designed to track and log events made in AWS accounts. It can be helpful from a security perspective to help monitor for unwanted or unauthorized access, a compliance perspective to monitor for inappropriate changes, and an operational perspective to troubleshoot AWS-based issues.

  1. Install the Splunk Add-on for Amazon Web Services (AWS).
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template. This will create the necessary resources to pick up the CloudTrail events in an S3 bucket and send them to Splunk. Specifically for CloudTrail events to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have CloudTrail events already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: cloudtrail
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on CloudTrail events
    • splunkIndex: the index CloudTrail events will be sent to
    • splunkJSONFormat: eventsInRecords
    • splunkSource: the source field setting on CloudTrail events
    • splunkSourcetype: aws:cloudtrail
    • splunkTimePrefix: eventTime
    • stage: If this is going into production, set this to something like prod
  5. If you are not already sending CloudTrail log files to an S3 bucket, deploy the S3-SQS-Lambda-Firehos-Resources/cloudTrailToS3.yml CloudFormation Template. We recommend deploying this in the AWS Security Account. Deploying this will create the necessary CloudTrail (for the whole Organization, in every Region by default) to send events to S3. The following parameter needs to be changed from their default values:
    • cloudTrailS3BucketName: {{accountId}}-{{region}}-cloudtrail, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, and where {{region}} is the AWS region that the previous stack was deployed to.
    • stage: If this is going into production, set this to something like prod
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:cloudtrail | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example CloudTrail Deployment via AWS CLI:

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-cloudtrail-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=cloudtrail ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkJSONFormat,ParameterValue=eventsInRecords ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:cloudtrail ParameterKey=splunkTimePrefix,ParameterValue=eventTime ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com
  • Deploy cloudTrailToS3.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-cloudtrail-to-s3 --template-body file://cloudTrailToS3.yml --parameters ParameterKey=stage,ParameterValue=prod ParameterKey=contact,ParameterValue=jsmith ParameterKey=cloudTrailS3BucketName,ParameterValue=012345678901-us-west-2-cloudtrail

VPC Flow Log Deployment Instructions

These instructions are for configuring AWS VPC Flow Logs to be sent to Splunk. VPC Flow Logs are a logging source, designed to capture network traffic metadata from AWS VPCs, subnets, and/or ENIs. VPC Flow Logs can be thought of as a combination of netflow and basic firewall logs, in the cloud, similar to what netflow is for on-prem networks. It can be helpful from a security perspective to help monitor for unwanted or unauthorized network access, a compliance perspective to record network traffic, and an operational perspective to troubleshoot networking issues.

  1. Install the Splunk Add-on for Amazon Web Services (AWS).
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template. This will create the necessary resources to pick up the VPC Flow Logs in an S3 bucket and send them to Splunk. Specifically for VPC Flow Logs to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have VPC Flow Logs already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: vpcflowlog
    • splunkEventDelimiter: space
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkIgnoreFirstLine: true
    • splunkHost: the host field setting on VPC Flow Log events
    • splunkIndex: the index VPC Flow Log events will be sent to
    • splunkSource: the source field setting on VPC Flow Log events
    • splunkSourcetype: aws:cloudwatchlogs:vpcflow
    • splunkTimeDelineatedField: 10
    • splunkTimeFormat: delineated-epoch
    • stage: If this is going into production, set this to something like prod
  5. If you are not already sending VPC Flow Logs files to an S3 bucket, deploy the S3-SQS-Lambda-Firehos-Resources/vpcFlowLogToS3.yml CloudFormation Template for each flow log you'd like to create. This means you may need to deploy this template in multiple accounts, and/or multiple times per-account. See the FAQ for information on where flog logs can be created. The following parameter needs to be changed from their default values:
    • vpcFlowLogResourceId: The ID of the subnet, network interface, or VPC for which you want to create a flow log. Taken straight from the parameter named resourceId in the AWS documetnation.
    • vpcFlowLogResourceType: The type of resource for which to create the flow log. Select NetworkInterface if you specified an ENI for vpcFlowLogResourceId, "Subnet" if you specified a subnet ID for vpcFlowLogResourceId, or "VPC" if you specified a VPC ID for vpcFlowLogResourceId"
    • vpcFlowLogS3BucketName: {{accountId}}-{{region}}-vpcflowlog, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, and where {{region}} is the AWS region that the previous stack was deployed to.
    • stage: If this is going into production, set this to something like prod
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:cloudwatchlogs:vpcflow | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example VPC Flow Log Deployment via AWS CLI:

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-vpcflowlog-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=vpcflowlog ParameterKey=splunkEventDelimiter,ParameterValue=space ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef ParameterKey=splunkIgnoreFirstLine,ParameterValue=true ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:cloudwatchlogs:vpcflow ParameterKey=splunkTimeDelineatedField,ParameterValue=10 ParameterKey=splunkTimeFormat,ParameterValue=delineated-epoch ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com
  • Deploy vpcFlowLogToS3.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-vpcflow-to-s3 --template-body file://vpcFlowLogToS3.yml --parameters  ParameterKey=vpcFlowLogResourceId,ParameterValue=vpc-0123456789abcdef0 ParameterKey=vpcFlowLogResourceType,ParameterValue=VPC ParameterKey=contact,ParameterValue=jsmith ParameterKey=vpcFlowLogS3BucketName,ParameterValue=012345678901-vpcflowlog-bucket ParameterKey=stage,ParameterValue=prod 

Sample VPC FLow Log Events

2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 172.21.11.60 47221 8472 17 1 134 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.60 172.21.11.151 55787 8472 17 1 227 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.60 172.21.11.151 6443 46364 6 1 52 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 172.21.11.60 43384 8472 17 16 20339 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 172.21.11.60 46488 6443 6 4 247 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 172.21.11.60 46468 6443 6 49 7609 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 20.193.143.177 172.21.11.151 80 37662 6 7 427 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.60 172.21.11.151 51573 8472 17 18 3072 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.60 172.21.11.151 6443 46464 6 2 339 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 159.89.86.140 35052 123 17 1 76 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 172.21.11.151 172.21.11.60 10250 61847 6 20 36607 1653342947 1653342975 ACCEPT OK
2 012345678912 eni-01239807b6ff2ff22 45.76.22.189 172.21.11.151 123 57486 17 1 76 1653342947 1653342975 ACCEPT OK

ELB Access Log Deployment Instructions

These instructions are for configuring AWS ELB Access Logs to be sent to Splunk. AWS ELB Access Logs contains events related to requests sent to a load balancer. These logs can be useful from a security perspective to help monitor for unwanted or unauthorized access, a compliance perspective to log access, and an operational perspective to troubleshoot AWS-based issues.

  1. Install the Splunk Add-on for Amazon Web Services (AWS).
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template, once per-region (but not per-account per-region) you want to ingest logs from. This will create the necessary resources to pick up the ELB Access Logs in an S3 bucket and send them to Splunk. Specifically for ELB Access Logs to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have ELB Access Logs already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: elblog
    • splunkEventDelimiter: space
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on ELB Access Log events
    • splunkIndex: the index ELB Access Log events will be sent to
    • splunkSource: the source field setting on ELB Access Log events
    • splunkSourcetype: aws:elb:accesslogs
    • splunkTimeDelineatedField: For application load balancer logs use 1, for network load balancer logs use 2, or for classic load balancer logs use 0
    • splunkTimeFormat: delineated-ISO8601
    • stage: If this is going into production, set this to something like prod
  5. On each ELB (ALB or NLB) you want to collect logs from, enable logging and configure the logs to be written to the S3 bucket created in step 4 or that's already in use for these logs.
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:elb:accesslogs | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example ELB Access Log Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-elblog-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=elblog ParameterKey=splunkEventDelimiter,ParameterValue=space ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:elb:accesslogs ParameterKey=splunkTimeDelineatedField,ParameterValue=1 ParameterKey=splunkTimeFormat,ParameterValue=delineated-ISO8601 ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com

Sample ELB Application Load Balancer Access Log Events

http 2022-05-23T18:14:27.251617Z app/gditoolkit-elbv2-14MI7OWL3PLKA/5d0bd31f3e9aecb8 180.129.199.58:59892 172.21.11.125:80 0.001 0.001 0.000 200 200 548 3428 "GET http://gditoolkit-elbv2-14mi7owl3plka-1413486595.us-west-2.elb.amazonaws.com:80/ HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36" - - arn:aws:elasticloadbalancing:us-west-2:012345678901:targetgroup/gdiDemo/6206628e2424a61e "Root=1-628bcf03-5c8fb6455464990b1c7ae840" "-" "-" 0 2022-05-23T18:14:27.248000Z "forward" "-" "-" "172.21.11.125:80" "200" "-" "-"
http 2022-05-23T18:14:27.425920Z app/gditoolkit-elbv2-14MI7OWL3PLKA/5d0bd31f3e9aecb8 180.129.199.58:59892 172.21.11.125:80 0.000 0.001 0.000 200 200 525 3576 "GET http://gditoolkit-elbv2-14mi7owl3plka-1413486595.us-west-2.elb.amazonaws.com:80/icons/ubuntu-logo.png HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36" - - arn:aws:elasticloadbalancing:us-west-2:012345678901:targetgroup/gdiDemo/6206628e2424a61e "Root=1-628bcf03-329f756e3e2f6eb75a563aa4" "-" "-" 0 2022-05-23T18:14:27.424000Z "forward" "-" "-" "172.21.11.125:80" "200" "-" "-"

Billing Cost and Usage Report Deployment Instructions

These instructions are for configuring AWS Cost and Usage (CUR) reports to be sent to Splunk after being placed in an S3 bucket. Cost and Usage Reports contain very detailed usage information for an AWS account. These reports can be useful for estimating, tracking, and reporting on cloud spend.

Even though Billing Cost and Usage reports are written to S3 in CSV, when using the parameters recommended below the Lambda function will re-write the event to JSON and remove any null fields.

  1. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  2. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  3. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template, once per-region (but not per-account per-region) you want to ingest logs from. This will create the necessary resources to pick up the cost and usage reports in an S3 bucket and send them to Splunk. Specifically for the CUR files to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have cost and usage reports already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • lambdaProcessorBatchSize: 1
    • lambdaProcessorMemorySize: 8192
    • lambdaProcessorTimeout: 900
    • logType: billingcur
    • splunkCSVToJSON: true
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on billing events
    • splunkIndex: the index billing events will be sent to
    • splunkRemoveEmptyCSVToJsonFields: true
    • splunkSource: the source field setting on billing events
    • splunkSourcetype: aws:billing:cur
    • splunkTimeFormat: prefix-ISO8601
    • splunkTimePrefix: UsageStartDate
    • sqsQueueVisibilityTimeoutInSecond: 930
    • stage: If this is going into production, set this to something like prod
  4. If you are not already sending AWS cost and usage reports to an S3 bucket, deploy the S3-SQS-Lambda-Firehos-Resources/billingCURToS3.yml CloudFormation Template to the us-east-1 region. If you're using AWS Organizations, this CloudFormation template should be deployed to the billing account. Otherwise, deploy this CloudFormation template to each AWS account you want to retrieve events from. The following parameter needs to be changed from their default values:
    • billingCURS3BucketName: {{accountId}}-{{region}}-billingcur, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, and where {{region}} is the AWS region that the previous stack was deployed to.
    • billingCURS3BucketRegion: The AWS Region the previous stack was deployed to.
    • stage: If this is going into production, set this to something like prod
  5. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:billing:cur | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example Cost and Usage Report Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-billingcur-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=lambdaProcessorBatchSize,ParameterValue=1 ParameterKey=lambdaProcessorMemorySize,ParameterValue=8192 ParameterKey=lambdaProcessorTimeout,ParameterValue=900 ParameterKey=splunkCSVToJSON,ParameterValue=true ParameterKey=logType,ParameterValue=billingcur ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkRemoveEmptyCSVToJsonFields,ParameterValue=true ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:billing:cur ParameterKey=splunkTimeFormat,ParameterValue=prefix-ISO8601 ParameterKey=splunkTimePrefix,ParameterValue=UsageStartDate ParameterKey=stage,ParameterValue=prod ParameterKey=sqsQueueVisibilityTimeoutInSecond,ParameterValue=930 ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com 
  • Deploy billingCURToS3.yml:
aws cloudformation create-stack --region us-east-1 --stack-name splunk-billingcur-to-s3 --template-body file://billingCURToS3.yml --parameters  ParameterKey=billingCURS3BucketName,ParameterValue="012345678901-us-west-2-billingcur" ParameterKey=billingCURS3BucketRegion,ParameterValue=us-west-2 ParameterKey=contact,ParameterValue=jsmith ParameterKey=stage,ParameterValue=prod 

Route 53 Resolver Query Logging Deployment Instructions

These instructions are for configuring Route 53 resolver query logs to be sent to Splunk after being placed in an S3 bucket. Route 53 query logs show DNS requests and responses made from within a VPC by sources such as EC2 instances, ECS/EKS containers, and Lambda functions. These logs can be useful for identifying malicious activity associated with DNS events.

  1. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  2. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  3. Package and deploy the ta_route53 add-on that is bundled in this application.
    1. If you use Splunk Cloud, you'll want to deploy this as a private application following these instructions.
    2. If you do not use Splunk cloud, you will need to package and deploy this app to any search heads searching this data, indexers ingesting & indexing this data, and heavy forwarders receiving this data.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template. This will create the necessary resources to pick up the query log files in an S3 bucket and send them to Splunk. Specifically for the the Route 53 query log files to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have Route 53 query log files already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • lambdaProcessorBatchSize: 1
    • logType: route53
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on billing events
    • splunkIndex: the index billing events will be sent to
    • splunkSource: the source field setting on billing events
    • splunkSourcetype: aws:route53
    • splunkTimeFormat: prefix-ISO8601
    • splunkTimePrefix: query_timestamp
    • stage: If this is going into production, set this to something like prod
  5. If you are not already sending Route 53 query logs to an S3 bucket, deploy the S3-SQS-Lambda-Firehos-Resources/route53QueryLogsToS3.yml) CloudFormation Template for each VPC you'd like to collect Route 53 query logs from. This means you may need to deploy this template in multiple accounts, and/or multiple times per-account. The following parameter needs to be changed from their default values:
    • destinationARN: The ARN where logs will be sent to. This should be the s3BucketArn output from the template deployed in step 4.
    • sourceVPCId: The VPC ID of the VPC for which you want to capture DNS events from. Taken straight from the parameter named resourceId in the AWS documentation.
    • stage: If this is going into production, set this to something like prod
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:route53 | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example Route 53 Resolver Query Logging Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name route53-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=lambdaProcessorBatchSize,ParameterValue=1   ParameterKey=logType,ParameterValue=route53 ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:route53 ParameterKey=splunkTimeFormat,ParameterValue=prefix-ISO8601 ParameterKey=splunkTimePrefix,ParameterValue=query_timestamp ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com ParameterKey=contact,ParameterValue=jsmith@contoso.com 
  • Deploy route53QueryLogsToS3.yml:
aws cloudformation create-stack --region us-west-2 --stack-name route53-to-s3 --template-body file://route53QueryLogsToS3.yml --parameters  ParameterKey=destinationARN,ParameterValue="arn:aws:s3:::012345678901-us-west-2-route53" ParameterKey=sourceVPCId,ParameterValue=vpc-a1b2c3d4e5f601234 ParameterKey=contact,ParameterValue=jsmith@contoso.com ParameterKey=stage,ParameterValue=prod 

aws cloudformation create-stack --region us-west-2 --stack-name route53-to-s3 --template-body file://route53QueryLogsToS3.yml --parameters  ParameterKey=destinationARN,ParameterValue="arn:aws:s3:::841154226728-us-west-2-route53" ParameterKey=sourceVPCId,ParameterValue=vpc-08bd740514a827fc0 ParameterKey=contact,ParameterValue=preeves@splunk.com ParameterKey=stage,ParameterValue=prod

Sample Route 53 Query Log Events:

{"version":"1.100000","account_id":"0123456789012","region":"us-west-2","vpc_id":"vpc-a1b2c3d4e5f601234","query_timestamp":"2022-08-05T20:39:37Z","query_name":"ec2messages.us-west-2.amazonaws.com.","query_type":"AAAA","query_class":"IN","rcode":"NOERROR","answers":[],"srcaddr":"172.22.12.101","srcport":"35086","transport":"UDP","srcids":{"instance":"i-0b48683bd9f23a535"}}
{"version":"1.100000","account_id":"0123456789012","region":"us-west-2","vpc_id":"vpc-a1b2c3d4e5f601234","query_timestamp":"2022-08-05T20:39:58Z","query_name":"ec2messages.us-west-2.amazonaws.com.","query_type":"A","query_class":"IN","rcode":"NOERROR","answers":[{"Rdata":"52.94.177.105","Type":"A","Class":"IN"}],"srcaddr":"172.21.12.101","srcport":"44682","transport":"UDP","srcids":{"instance":"i-0b48683bd9f23a535"}}
{"version":"1.100000","account_id":"0123456789012","region":"us-west-2","vpc_id":"vpc-a1b2c3d4e5f601234","query_timestamp":"2022-08-05T20:39:37Z","query_name":"ec2messages.us-west-2.amazonaws.com.","query_type":"A","query_class":"IN","rcode":"NOERROR","answers":[{"Rdata":"52.94.177.107","Type":"A","Class":"IN"}],"srcaddr":"172.21.12.101","srcport":"59186","transport":"UDP","srcids":{"instance":"i-0b48683bd9f23a535"}}
{"version":"1.100000","account_id":"0123456789012","region":"us-west-2","vpc_id":"vpc-a1b2c3d4e5f601234","query_timestamp":"2022-08-05T20:40:32Z","query_name":"splunk.greentangent.net.","query_type":"AAAA","query_class":"IN","rcode":"NOERROR","answers":[],"srcaddr":"172.21.12.101","srcport":"46168","transport":"UDP","srcids":{"instance":"i-0b48683bd9f23a535"}}
{"version":"1.100000","account_id":"0123456789012","region":"us-west-2","vpc_id":"vpc-a1b2c3d4e5f601234","query_timestamp":"2022-08-05T20:43:38Z","query_name":"ssm.us-west-2.amazonaws.com.","query_type":"AAAA","query_class":"IN","rcode":"NOERROR","answers":[],"srcaddr":"172.21.12.101","srcport":"57284","transport":"UDP","srcids":{"instance":"i-0b48683bd9f23a535"}}

S3 Server Access Log Deployment Instructions

These instructions are for configuring S3 Server Access logs to be sent to Splunk after being placed in an S3 bucket. S3 Server Access logs show object requests made to an S3 bucket. These logs can be useful for monitoring object-level access to objects in a bucket.

  1. Install the Splunk Add-on for Amazon Web Services (AWS).
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template, per-account and per-region you want to collect the logs from. This will create the necessary resources to pick up the log files in an S3 bucket and send them to Splunk. Specifically for the the S3 server access log files to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have Route 53 query log files already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: s3serveraccesslog
    • splunkEventDelimiter: space
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on s3 server access log events
    • splunkIndex: the index s3 server access log events will be sent to
    • splunkSource: the source field setting on s3 server access log events
    • splunkSourcetype: aws:s3:accesslogs
    • splunkStrfTimeFormat: [%d/%b/%Y:%H:%M:%S
    • splunkTimeDelineatedField: 2
    • splunkTimeFormat: delineated-strftime
    • stage: If this is going into production, set this to something like prod
  5. If you are not already sending s3 server access logs to an S3 bucket, configure these events to be sent to the S3 bucket created from step 4. The name of the S3 bucket will be {{accountId}}-{{region}}-s3serveraccesslogs, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, and where {{region}} is the region the previous stack was deployed to. This can be done through a variety of methods such as the AWS console, REST APIs, SDKs, CLI or CloudFormation.
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype=aws:s3:accesslogs | head 100, where {{splunkIndex}} is the destination index selected in step 3.

Example S3 Server Access Log Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml:
aws cloudformation create-stack --region us-west-2 --stack-name s3serveraccesslog-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=s3serveraccesslog ParameterKey=splunkEventDelimiter,ParameterValue=space ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=aws ParameterKey=splunkIndex,ParameterValue=main ParameterKey=splunkSource,ParameterValue=aws ParameterKey=splunkSourcetype,ParameterValue=aws:s3:accesslogs ParameterKey=splunkStrfTimeFormat,ParameterValue="'[%d/%b/%Y:%H:%M:%S'" ParameterKey=splunkTimeDelineatedField,ParameterValue=2 ParameterKey=splunkTimeFormat,ParameterValue=delineated-strftime ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com ParameterKey=contact,ParameterValue=jsmith@contoso.com 

Custom Log Deployment Instructions

These instructions are for configuring events from a log type to be sent to Splunk, that are not defined in other instructions in this document. These instructions are longer and more complex to account for different log types. Several additional log types were tested while building the related Lambda function and CloudFormation, but it's possible that the Lambda and CloudFormation cannot support the log types you want to send.

  1. Configure any configuration parameters associated with the Indexing phase of ingestion. Input and Parsing is taken care of in the Lambda function
    • Information on the different ingestion phases can be found in the Splunk Wiki
    • If there is a TA for these logs, Install the TA SplunkBase.
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template. This will create the necessary resources to pick up the logs in an S3 bucket and send them to Splunk. Specifically for custom log types to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have the logs already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: set this to something descriptive for the log type. It must match the normal S3 bucket naming rules but also be less than 23 characters.
    • roleAccessToS3BucketPut: If you are creating a new bucket, enter the ARN of the role that will be putting (copying) files to the new bucket.
    • splunkEventDelimiter: If the logs are delimited based on a character, enter that here. Currently supported values are space for a space (U+00A in Unicode, \s in Python), tab for a tab (U+0009 in Unicode, \t in Python), comma for a comma (U+002C in Unicode, , in Python), semicolon for a semicolon (U+003B in Unicode, ; in Python).
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting in these events
    • splunkIgnoreFirstLine: Set to true if you don't want the first line sent to Splunk
    • splunkJSONFormat: If the file has events in NDJSON/JSONL format, this value tells the Lambda function how to process the event.
    • splunkIndex: the index events will be sent to
    • splunkSource: the source field setting in these events
    • splunkSourcetype: the sourcetype field setting in these events
    • splunkTimeDelineatedField: If the events are delineated, set this numerical value of where the timestamp is set in the event, with the first field being 0 (because arrays start at 0).
    • splunkTimeFormat: Set this to the how the timestamp is formatted. For delineated events, select one of the values starting with delineated-, and for timestamps that have preceding characters or are at the very beginning of the event select one starting with prefix-. If the timestamp is in ISO8601 format, select one ending in ISO8601, but if the timestamp is in Unix time select one ending in epoch.
    • splunkTimePrefix: If the event timestamp is preceded by characters (such as eventTime), enter those characters here.
    • stage: If this is going into production, set this to something like prod
  5. If the events are not already going to an S3 bucket, configure the events to go to the S3 bucket created in step 4. This bucket will be named {{accountId}}-{{region}}-{{logType}}, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, where {{region}} is the region the previous stack was deployed to, and {{logType}} is the logType parameter set in the previous stack.
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype={{splunkSourcetype}} | head 100, where {{splunkIndex}} is the destination index selected in step 3, and {{splunkSourcetype}} is the splunkSourcetype set in step 3.

Example Custom Log Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml for NDJSON Formatted Event:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-ndjson-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=app1 ParameterKey=roleAccessToS3BucketPut,ParameterValue=arn:aws:iam::012345678901:role/app1Servers  ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=app1 ParameterKey=splunkJSONFormat,ParameterValue=NDJSON  ParameterKey=splunkIndex,ParameterValue=app1Logs ParameterKey=splunkSource,ParameterValue=app1 ParameterKey=splunkSourcetype,ParameterValue=app1:logs ParameterKey=splunkTimeFormat,ParameterValue=prefix-ISO8601 ParameterKey=splunkTimePrefix,ParameterValue=eventTime  ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com

Sample NDJSON Events

{"eventTime":"2022-05-26T13:50:08.790Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"appserver1.contoso.com,"port":9103,"tls":true,"rejectUnauthorized":false}
{"eventTime":"2022-05-26T13:50:08.792Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653573068792}
{"eventTime":"2022-05-26T13:50:08.793Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"appserver1.contoso.com,"port":9103,"tls":true,"rejectUnauthorized":false}

Deploy eventsInS3ToSplunk.yml for CSV Formatted Events

aws cloudformation create-stack --region us-west-2 --stack-name splunk-ndjson-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=app2 ParameterKey=roleAccessToS3BucketPut,ParameterValue=arn:aws:iam::012345678901:role/app2Servers ParameterKey=splunkEventDelimiter,ParameterValue=comma ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=app2 ParameterKey=splunkIgnoreFirstLine,ParameterValue=true  ParameterKey=splunkIndex,ParameterValue=app2Logs ParameterKey=splunkSource,ParameterValue=app2 ParameterKey=splunkSourcetype,ParameterValue=app2:logs ParameterKey=splunkTimeFormat,ParameterValue=delinitated-epoch ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com
Sample CSV Events
1653573068792,api,output:DistWorker,info,attempting to connect,appserver1.contoso.com,9103,true,false
1653573068793,api,output:DistWorker,debug,will retry to connect
1653573068798,api,output:DistWorker,debug,connecting,appserver1.contoso.com,9103,true,false

Custom Log Deployment Instructions for Parquet Files

These instructions are for configuring events from parquet files in S3 to be sent to Splunk. Parquet files can be useful to ingest, specifically those generated by the Amazon Security Lake.

  1. Configure any configuration parameters associated with the Indexing phase of ingestion. Input and Parsing is taken care of in the Lambda function
    • Information on the different ingestion phases can be found in the Splunk Wiki
    • If there is a TA for these logs, Install the TA SplunkBase.
    • If you use Splunk Cloud, install the add-on on the ad-hoc search head or ad-hoc search head cluster.
    • If you do not use Splunk Cloud, install this add-on on the HEC endpoint (probably either the indexer(s) or heavy forwarder), your indexer(s), and your search head(s).
  2. Configure any firewall rules in front of Splunk to receive data from Amazon Data Firehose.
    • Reference the AWS documentation for the IP ranges required. Make sure to add the IP ranges from the region you'll be deploying the CloudFormation to.
    • If you use Splunk Cloud, you'll want to add the relevant IP range(s) to the HEC feature in the IP allowlist.
    • If you do not use Splunk Cloud, you'll need to consult with your Splunk Architect, Splunk Admin, and/or network team to determine which firewall rules to change and where.
  3. Create a HEC token with Indexer acknowledgment turned on, in Splunk to ingest the events, with these specific instructions:
    • Make sure to enable indexer acknowledgment.
    • Leave the sourcetype to Automatic. The Lambda function will set the sourcetype before sending it to Firehose.
    • Select the index(es) you want the data to be sent to.
    • Amazon Data Firehose does check the format of the tokens, so we recommend letting Splunk generate this rather than setting it manually through inputs.conf.
    • If you use Splunk Cloud, follow these instructions
    • If you do not use Splunk Cloud, the HEC token will need to be created on the Splunk instance that will be receiving this data (probably either the indexer(s) or a heavy forwarder). Instructions for this can be found here.
  4. Deploy the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml CloudFormation Template. This will create the necessary resources to pick up the logs in an S3 bucket and send them to Splunk. Specifically for custom log types to be sent to Splunk, these parameters need to be changed from the default values:
    • existingS3BucketName: If you have the logs already being sent to an existing S3 bucket, set this parameter to the name of that bucket. If you set this setting, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
    • logType: set this to something descriptive for the log type. It must match the normal S3 bucket naming rules but also be less than 23 characters.
    • roleAccessToS3BucketPut: If you are creating a new bucket, enter the ARN of the role that will be putting (copying) files to the new bucket.
    • splunkHECEndpoint: https://{{url}}:{{port}}
      • For Splunk Cloud, this will be https://http-inputs-firehose-{{stackName}}.splunkcloud.com:443, where {{stackName}} is the name of your Splunk Cloud stack.
      • For non-Splunk Cloud deployments, consult with your Splunk Architect or Splunk Admin.
    • splunkHECToken: The value of the HEC token from step 3
    • splunkHost: the host field setting on these events
    • splunkIndex: the index these events will be sent to
    • splunkSource: the source field setting on these events
    • splunkSourcetype: the sourcetype field setting in these events
    • splunkTimeFormat: prefix-epoch
    • splunkTimePrefix: the field in the parquet file that specifies the timestamp
    • stage: If this is going into production, set this to something like prod
  5. If the events are not already going to an S3 bucket, configure the events to go to the S3 bucket created in step 4. This bucket will be named {{accountId}}-{{region}}-{{logType}}, where {{accountId}} is the AWS Account ID where the previous stack was deployed to, where {{region}} is the region the previous stack was deployed to, and {{logType}} is the logType parameter set in the previous stack.
  6. Verify the data is being ingested. The easiest way to do this is to wait a few minutes, then run a search like index={{splunkIndex}} sourcetype={{splunkSourcetype}} | head 100, where {{splunkIndex}} is the destination index selected in step 3, and {{splunkSourcetype}} is the splunkSourcetype set in step 3.

Example Custom Log Deployment via AWS CLI

  • Deploy eventsInS3ToSplunk.yml for NDJSON Formatted Event:
aws cloudformation create-stack --region us-west-2 --stack-name splunk-parquet-to-splunk --capabilities CAPABILITY_NAMED_IAM --template-body file://eventsInS3ToSplunk.yml --parameters ParameterKey=logType,ParameterValue=parquet1 ParameterKey=lambdaLayerArn,ParameterValue=arn:aws:lambda:us-west-2:336392948345:layer:AWSSDKPandas-Python39-Arm64:1 ParameterKey=roleAccessToS3BucketPut,ParameterValue=arn:aws:iam::012345678901:role/app1Servers  ParameterKey=splunkHECEndpoint,ParameterValue=https://http-inputs-firehose-contoso.splunkcloud.com:443 ParameterKey=splunkHECToken,ParameterValue=01234567-89ab-cdef-0123-456789abcdef  ParameterKey=splunkHost,ParameterValue=app1 ParameterKey=splunkIndex,ParameterValue=app1Logs ParameterKey=splunkSource,ParameterValue=app1 ParameterKey=splunkSourcetype,ParameterValue=app1:logs ParameterKey=splunkTimeFormat,ParameterValue=prefix-epoch ParameterKey=splunkTimePrefix,ParameterValue=eventTime  ParameterKey=stage,ParameterValue=prod ParameterKey=cloudWatchAlertEmail,ParameterValue=jsmith@contoso.com

Uninstallation Instructions

  1. On each log source (CloudTraill Trail, ELB, VPC, or custom log source), unconfigure the logging.
    • If you deployed the cloudTrailToS3.yml CloudFormation template, delete the stack that was deployed.
    • If you deployed the vpcFlowLogToS3.yml CloudFormation template, delete the stack that was deployed.
  2. Empty the bucket named {{accountId}}-{{region}}-{{logType}}-bucket of all files.
    • The logType is cloudtrail if you deployed using the CloudTrail instructions.
    • The logType is vpcflowlog if you deployed using the VPC Flow Log instructions.
    • The logType is elblog if you deployed using the ELB Access Logs instructions.
    • The logType is route53 if you deployed using the ELB Access Logs instructions.
  3. Delete the CloudFormation stack that deployed the resources to configure ingestion to S3.
  4. Delete the HEC token that was deployed to Splunk to ingest the data.
  5. Remove any firewall exceptions that were created for Firehose to send to Splunk over HEC.

FAQ

  • When would I want to ingest CloudTrail logs through this method instead of using the SCDM? You would want the method described here to ingest AWS CloudTrail logs if any of the following are true:
    • You are currently using AWS Landing Zones
    • You are currently using AWS Organizations
    • You want to ingest CloudTrail events from multiple AWS accounts
    • You want to minimize costs
  • How big can the files in S3 be? By default, uncompressed, each file can be 400MB. This can be tuned (see next question).
  • How can I tune the Lambda function? There are a few different parameters on the CloudFormation template:
    • lambdaProcessorBatchSize: How many objects (files) from S3 for Lambda to process at the same time. Set this to a lower value if the function is timing out. Set this to a higher value to reduce costs.
    • lambdaProcessorBatchingWindowInSeconds: Essentially how long for the Lambda function to gather objects (files) before processing them. Set this to a higher value to reduce costs, and set this to something lower to reduce event latency.
    • lambdaProcessorMemorySize: How much memory to allocate to the Lambda function. The amount of allocated memory is directly related to how much processing power is allocated to the Lambda function. This means that the more memory that is allocated (usually) the faster the Lambda function runs. Raise this value to lower event latency, and/or if the function is timing out.
    • lambdaProcessorTimeout: How long to let the Lambda function run before terminating it. Raise this if the Lambda function is timing out.
    • sqsQueueVisibilityTimeoutInSecond: How long to let SQS messages be taken by the Lambda function before they become available to be processed again. Must be more than lambdaProcessorTimeout. Changing this setting has no relationship to improving costs, event latency, or performance.
  • The Lambda function is executing too often, or my expenses are too high. What can I do? See the "How can I tune the Lambda function?" question in the FAQ.
  • It's taking too long for events to get to Splunk once they're placed in S3. What can I do?. See the "How can I tune the Lambda function?" question in the FAQ.
  • I have an existing S3 bucket I want to pull data in from. Will the eventsInS3ToSplunk.yml CloudFormation stack change any settings on the existing bucket? No. If you have an existing bucket no change is made to that existing bucket.
  • I'm using an existing S3 bucket. Are there any additional settings I need to set? Yes, you'll need to create the S3 event notification so that when a new object (file) is put (sent) to the S3 bucket, an SQS message is generated and sent to the queue. An example of this in CloudFormation is in the "NotificationConfiguration" properties of the s3Bucket resource in the S3-SQS-Lambda-Firehos-Resources/eventsInS3ToSplunk.yml file.
  • Why is the Lambda function pulled from an S3 bucket? With CloudFormation, you can only specify a Lambda function up to 4096 characters in length. Otherwise you need to pull the data from an S3 bucket.
  • How can I see what resources are being deployed? You can see the resources that are going to be deployed in the CloudFormation template).
  • I need to look or review at the code that the Lambda function is running. How can I do that? That code can be pulled from the GitHub repo.
  • I have a timestamp in a different format than what is currently supported. What should I do? Message me on the community Slack so we can look at getting it implemented.
  • I have an event format in a different format than what is currently supported. What should I do? Message me on the community Slack so we can look at getting it implemented.
  • I have a custom VPC Flow Log format. What changes should I make when following the instructions to get VPC Flow Logs sent to Splunk? The only change you should need to make is possibly adjusting the splunkTimeDelineatedField parameter in the CloudFormation stack to point to the correctly-delineated time field.
  • Why is there no support for ELB Classic Access Logs? There is currently no supported Splunk sourcetype for these logs in the Splunk Add-on for Amazon Web Services (AWS). It is possible to get these events into Splunk, however field extraction won't be correct. You could create your own custom sourcetype for these though.
  • Do I have to create 1 HEC token per sourcetype + index combination, or can I use the same HEC token for multiple sourcetype + index combinations? You can re-use an existing HEC token as long as the HEC token is configured to send to all of the indexes you want to send data to using that HEC token.
  • How long should it take the eventsInS3ToSplunk.yml stack to deploy? It may take up to 10 minutes to deploy. Unless it errors out everything is ok!
  • Will this work across multiple AWS accounts? Yes! By design, the CloudTrail, VPC Flow Log, and ELB instructions will work across multiple accounts. For a custom format, you'll need to make sure that the bucket permissions are set up to accept events from multiple accounts.
  • Why is the CAPABILITY_NAMED_IAM capability required to deploy eventsInS3ToSplunk.yml? There are IAM roles and permissions with custom names defined. The IAM roles/permissions are required to grant resources access to other resources (eg Lambda to be able to pull the log files from the S3 bucket), and custom names were set for uniform naming across all the resources deployed in the stack.
  • I found a typo. What should I do? Feel free to message me on the community Slack or submit a PR. I'm really bad at typos. Thank you!
  • I want to set a custom time objects (files) are kept in the S3 bucket. How can I do that? Modify the s3ObjectExpirationInDays setting.
  • Why would I ever set splunkIgnoreFirstLine to true? In some files, like CSV, this first line can provide reference to the field names and doesn't actually contain any event data (like with VPC Flow Logs), so you want to exclude this line from being ingested into Splunk.
  • Why is there no encryption for the SQS queue? Because Cloudformation doesn't support SQS-SSE yet, and KMS encryption hasn't been build-out in the CloudFormation templates yet.
  • Why SQS then Lambda instead of just S3-->Lambda? For the batching features that the SQS-Lambda integration provides, to reduce Lambda invocations and therefore reduce costs.
  • Why not use CloudWatch Logs, then send to Firehose from CloudWatch Logs? CloudWatch Logs can get expensive and I wanted to keep costs for this solution down.
  • How can I see statistics (event latency, license usage, event count, etc) about data coming in through this method of sending data to Splunk? There's code for a dashboard located that you can use that reports on this type of information.
  • Where did you get the AWS account IDs from in the eventsInS3ToSplunk.yml CloudFormation template? https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#access-logging-bucket-permissions
  • What are limitations around ingesting ELB logs?
    • For classic load balancers, there is no existing sourcetpye for classic ELB logs, so some field extractions will not work.
    • For network load balancers, log events will only be generated be generated on load balancers that have a TLS listener and when TLS requests are sent to the load balancer. This is a documented limitation on AWS's side.
  • Why does the sampleElbToS3.yml CloudFormation Template need to be deployed to us-east-1? This is a limitation on AWS's side. Since billing is a global service, cost and usage reports can only be created in us-east-1.
  • What if I want to send data to Edge Processor? The recommended way to get the data to Edge Processor is to set up the architecture defined in the EP SVA, with Firehose acting as the HEC source. You will need to configure a signed certificate on the load balancer since Amazon Data Firehose requires the destination it's sending to to have a signed certificate and appropriate DNS entry. The DNS Round Robin architecture could also be used.
  • What if I want to send data to Ingest Processor? To get data to Ingest Processor, first send it to the Splunk Cloud environment like normal, then when creating the pipeline to interact with the data specify a partition that will apply the data being sent from the Splunk AWS GDI Toolkit. For more information on how to do this, refer to the Ingest Processor documentation.
  • I have another question. What should I do? Feel free to message me on the community Slack! I'd love to help.

Design Decisions

  • Log files need to be compressed in gzip format or uncompressed.
  • Log files need to end in:
    • .gz
    • .gzip
    • .json
    • .csv
    • .log
    • .parquet
    • .txt
    • .ndjson
    • .jsonl
  • The only currently-supported log formats are:
    • New line, space, tab, comma, and semi-colon deliniated
    • New-line delineated JSON (NDJSON/JSONL)
    • JSON where events are in a JSON list named Records
  • The only currently-supported timestamp formats are:
    • prefix-ISO8601: Where a prefix preceeds a timestamp in ISO8601 format (eg eventTime: 2022-03-01T23:01:01.0Z, for March 1st, 2022 at 11:01:01pm UTC, and eventTime is the prefix).
    • delineated-ISO8601: Where a delineated file (eg comma-dilineated file) has a timestamp in ISO8601 format in a specific field.
    • delineated-epoch: Where a delineated file (eg comma-dilineated file) has a timestamp in Unix epoch format.
  • UTC is the only supported time zone as it is the one true timezone for machines.
  • CloudTrail-Digest files are explicitly blocked from being ingested.
  • Amazon Data Firehose was used to take advantage of the automatic retries when sending to Splunk, send failed events to an S3 bucket so they aren't lost, and indexer acknowledgement to make sure the event is successfully sent to Splunk.
  • Each event must be smaller than 1000KiB. This is a limit in Amazon Data Firehose.
  • Firehose is sending to the Splunk event endpoint so that the indexers do less work parsing the data, and the same HEC token can be re-used for multiple sourcetype + index combinations.
  • The AWS Account ID and region are specified everywhere to have a uniform naming structure for all of the resources created via CloudFormation, and because S3 buckets need to be globally (across every single AWS account) unique. Using the AWS account ID was an easy way to guarantee unique bucket names.
  • Event-driven architecture to only use resources when needed.
  • Chose loosely-coupled, highly-cohesive microservices.
  • Decouple as much as possible to allow for better scaling and troubleshooting. This was done by having clear demarcation points inside of the microservices such as CloudTrail-->S3, S3-->SQS, Lambda-->Firehose.

Troubleshooting

  • The Lambda function is timing out. See the "How can I tune the lambda function?" question in the FAQ.
  • The CloudFormation template fails to deploy when I try to deploy it..
    • Verify that the role you are using to deploy the CloudFormation template has the appropriate permissions to deploy the resources you're trying to deploy.
    • Verify that the parameters are set correctly on the stack.
    • Also, in the CloudFormation console, check the events on the failed stack for hints to where it failed.
    • Also see the official Troubleshooting CloudFormation documentation.
  • Events aren't getting to Splunk. What should I check? In this order, check the following:
    1. That the CloudFormation template deployed without error.
    2. That the parameters (especially splunkHECEndpoint and splunkHECToken) are correct.
    3. That the log files are being put into the S3 bucket. The easiest way to do this is just to check the bucket through the AWS Console to see if objects (files) are being put (copied) to it.
    4. That SQS messages are being delivered from the S3 bucket to the SQS queue by going to the SQS queue in the AWS Console, clicking the "Monitoring" tab, then looking at the "Number of Messages Received" pane.
    5. That the Lambda function is executing by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, then the "Metrics" sub-tab and looking at the "Invocations" pane.
    6. That the Lambda function is executing successfully by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, then the "Metrics" sub-tab and looking at the "Error count and success rate (%)" pane.
    7. That the Lambda function isn't producing errors by going to the Lambda function in the AWS Console, clicking the "Monitor" tab, clicking "View logs in CloudWatch", then checking the events in the Log streams.
    8. That the Amazon Data Firehose is receiving records by going to the Firehose delivery strema in the AWS Console, clicking the "Monitoring" tab if it's not selected, and viewing the "Incoming records" pane.
    9. That the Amazon Data Firehose is sending records to Splunk by going to the Firehose delivery strema in the AWS Console, clicking the "Monitoring" tab if it's not selected, and viewing the "Delivery to Splunk success" pane. You can also view the "Destination error logs" pane on that same page.
    10. That there are no errors related to ingestion in Splunk.
    11. That any firewall ports are open from Firehose to Splunk.
    12. If you're deploying the billingCURToS3.yml file, make sure you're deploying it to us-east-1.
  • Not all of the events are getting to Splunk or Amazon Data Firehose is being throttled.
    • Amazon Data Firehose has a number of quotas associated with each Firehose. You can check whether you're being throttled by navigating to the monitoring tab in the AWS console for the Firehose and checking if the "Throttled records (Count)" value is greater than zero. If the Firehose is being throttled, you can use the Kinesis Firehose Service quota increase form to request that quota be increased.
Clone this wiki locally