Caution
Sysdig released a new onboarding experience for AWS in September 2024. We recommend connecting your cloud accounts by following these instructions.
This repository should be used solely in cases where Agentless Threat Detection cannot be used.
There are several ways to deploy Agent based Cloud Detection and Response (CDR) in your AWS infrastructure:
- Single Account on ECS
- Single Account on AppRunner
- Single-Account with a pre-existing Kubernetes Cluster
- Organizational
If you're unsure about how to use this module, please contact your Sysdig representative. Our experts will guide you through the process and assist you in setting up your account securely and correctly.
Terraform provider credentials/token, requires Administrative
permissions in order to be able to create the
resources specified in the per-example diagram.
Some components may vary, or may be deployed on different accounts (depending on the example). You can check full resources on each module "Resources" section in their README's. You can also check our source code and suggest changes.
This would be an overall schema of the created resources, for the default setup.
- Cloudtrail / SNS / S3 / SQS / KMS
- SSM Parameter for Sysdig API Token Storage
- Sysdig Workload: ECS / AppRunner creation (K8s cluster is pre-required, not created)
- each compute solution require a role to assume for execution
- CodeBuild for on-demand image scanning
- Sysdig role for Compliance
Note: service wiring required extra permissions are not stated here (ex.: ECS service requires a runtime and execution role)
Compliance
IAM Role and IAM Policies (arn:aws:iam::aws:policy/SecurityAudit
) to allow Sysdig to run Compliance tasks. More details on its module cloud-bench
sts:AssumeRole
Threat-Detection specific
ssm: GetParameters
sqs: ReceiveMessage
sqs: DeleteMessage
s3: ListBucket
s3: GetObject
Image-Scanning specific
# all type scanning
codebuild: StartBuild
# deploy_image_scanning_ecs
ecs:DescribeTaskDefinition
# deploy_image_scanning_ecr
ecr: GetAuthorizationToken
ecr: BatchCheckLayerAvailability
ecr: GetDownloadUrlForLayer
ecr: GetRepositoryPolicy
ecr: DescribeRepositories
ecr: ListImages
ecr: DescribeImages
ecr: BatchGetImage
ecr: GetLifecyclePolicy
ecr: GetLifecyclePolicyPreview
ecr: ListTagsForResource
ecr: DescribeImageScanFindings
- Other Notes:
- Runtime AWS IAM permissions on JSON Statement format
- only Sysdig workload related permissions are specified above; infrastructure internal resource permissions (such as Cloudtrail permissions to publish on SNS, or SNS-SQS Subscription) are not detailed.
- For a better security, permissions are resource pinned, instead of
*
- Check Organizational Use Case - Role Summary for more details
Check official documentation on Secure for cloud - AWS, Confirm the Services are working
Generally speaking, a triggered situation (threat or image-scanning) whould be check (from more functional-side to more technical)
- Secure UI > Events / Insights / ...
- Cloud-Connector Logs - To access logs in AWS visit - Cloudwatch > LogGroup > sysdig or cloudconnector
- Cloudtrail > Event History
Choose one of the rules contained in an activated Runtime Policies for AWS, such as Sysdig AWS Activity Logs
policy and execute it in your AWS account.
ex.: 'Delete Bucket Public Access Block' can be easily tested going to an
S3 bucket > Permissions > Block public access (bucket settings) > edit > uncheck 'Block all public access'
Remember that in case you add new rules to the policy you need to give it time to propagate the changes.
In the cloud-connector
logs you should see similar logs to these (within the console-notifier
component log)
A public access block for a bucket has been deleted (requesting user=OrganizationAccountAccessRole, requesting IP=x.x.x.x, AWS region=eu-central-1, bucket=***
If that's not working as expected, some other questions can be checked
- are events consumed in the sqs queue, or are they pending?
- are events being sent to sns topic?
In Secure > Events
you should see the event coming through, but beware you may need to activate specific levels such as Info
depending on the rule you're firing.
Alternativelly, use Terraform example module to trigger Create IAM Policy that Allows All event can be found on examples/trigger-events.
When scanning is activated, should see following lines on the cloud-connector compute componente logs
{"component":"ecs-action","message":"starting Cloud Scanning ECS action"}
{"component":"ecr-action","message":"starting Cloud Scanning ECR action"}
-
For ECR image scanning, upload any image to an ECR repository of AWS. Can find CLI instructions within the UI of AWS
It may take some time, but you should see logs detecting the new image in the ECR repository
{"component":"ecr-action","message":"processing detection {\"account\":\"***\",\"image\":\"***.dkr.ecr.us-east-1.amazonaws.com/myimage:tag\",\"region\":\"us-east-1\"}. source=aws_cloudtrail"} {"component":"ecr-action","message":"starting ECR scanning for ***.dkr.ecr.us-east-1.amazonaws.com/myimage:tag at account ‘***’ region ‘us-east-1’"}
and a CodeBuild project being launched successfully
-
For ECS running image scanning, deploy any task in your own cluster, or the one that we create to deploy our workload (ex.
amazon/amazon-ecs-sample
image).It may take some time, but you should see logs detecting the new image in the ECS cloud-connector task
{"component":"ecs-action","message":"processing detection {\"account\":\"***\",\"region\":\"eu-west-3\",\"taskDefinition\":\"apache:1\"}. source=aws_cloudtrail"} {"component":"ecs-action","message":"analyzing task 'apache:1' in region 'eu-west-3'"} {"component":"ecs-action","message":"starting ECS scanning for container index 0 in task 'apache:1'"}
and a CodeBuild project being launched successfully
A: Seems a bug with some providers
S: Upgrade to Terraform 1.3.1
Q-Debug: Need to modify cloud-connector config (to troubleshoot with debug
loglevel, modify ingestors for testing, ...)
A: both in ECS and AppRunner workload types, cloud-connector configuration is passed as a base64-encoded string through the env var CONFIG
S: Get current value, decode it, edit the desired (ex.:logging: debug
value), encode it again, and spin it again with this new definition.
For information on all the modifyable configuration see Cloud-Connector Chart reference
A: Solution is based on Cloudtrail delivery times
S: Wait at least 15 minutes as specified in the official AWS documentation
For Identity and Access Management, when connected it will be in the learning mode
A: Make sure you installed both cloud-bench and cloud-connector modules
A: Need to check several steps
S: First, image scanning is not activated by default. Ensure you have the required scanning enablers in place.
Currently, images are scanned on registry/repository push events, and on the supported compute services on deployment. Make sure these events are triggered.
Dig into secure for cloud compute log (cloud-connector) and check for errors.
If previous logs are ok, check spawned scanning service logs
A: We don’t scan images from the management account ECR because is not a best pratice to have an ECR in this account.
S: Following Role has to be created in the management account
- Role Name: OrganizationAccountAccessRole
- Permissions Policies:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "CustomPolicy", "Effect": "Allow", "Action": "ecr:GetAuthorizationToken", "Resource": "*" } ] }
- Trust Relationships:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<ORG_MANAGEMENT_ACCOUNT_ID>:root" }, "Action": "sts:AssumeRole" } ] }
Q-General: Getting error "Error: cannot verify credentials" on "sysdig_secure_trusted_cloud_identity" data
A: This happens when Sysdig credentials are not working correctly.
S: Check sysdig provider block is correctly configured with the sysdig_secure_url
and sysdig_secure_api_token
variables
with the correct values. Check Sysdig SaaS per-region URLs if required
A: Refer to Sysdig SASS Region and IP Ranges Documentation to get Sysdig SaaS endpoint and allow both outbound (for compute vulnerability report) and inbound (for scheduled compliance checkups)
ECS type deployment will create following security-group setup
A: This happens when a previous installation of secure-for-cloud exists. On each account where Sysdig has to create resources, it will create a grouping resource-group using the name
variable (defaulted to sfc
on main examples).
S: Remove previous installation, or if multiple setups are required, use the name
varible to change the resource-group name.
Q-AWS: In the ECS compute flavor of secure for cloud, I don't see any logs in the cloud-connector component
A: This may be due to the task not beinb able to start, normally due not not having enough permissions to even fetch the secure apiToken, stored in the AWS SSM service.
S: Access the task and see if there is any value in the "Stopped Reason" field.
Q-AWS: Getting error "Error: failed creating ECS Task Definition: ClientException: No Fargate configuration exists for given values.
A: Your ECS task_size values aren't valid for Fargate. Specifically, your mem_limit value is too big for the cpu_limit you specified
S: Check supported task cpu and memory values
Q-AWS: Getting error "404 Invalid parameter: TopicArn" when trying to reuse an existing cloudtrail-sns
│ Error: error creating SNS Topic Subscription: InvalidParameter: Invalid parameter: TopicArn
│ status code: 400, request id: 1fe94ceb-9f58-5d39-a4df-169f55d25eba
│
│ with module.cloudvision_aws_single_account.module.cloud_connector.module.cloud_connector_sqs.aws_sns_topic_subscription.this,
│ on ../../../modules/infrastructure/sqs-sns-subscription/main.tf line 6, in resource "aws_sns_topic_subscription" "this":
│ 6: resource "aws_sns_topic_subscription" "this" {
A: In order to subscribe to a SNS Topic, SQS queue must be in the same region
S: Change aws provider
region
variable to match same region for all resources
│ Error: error creating subnet: InvalidParameterValue: Value (apne1-az3) for parameter availabilityZoneId is invalid. Subnets can currently only be created in the following availability zones: apne1-az1, apne1-az2, apne1-az4.
│ status code: 400, request id: 6e32d757-2e61-4220-8106-22ccf814e1fe
│
│ with module.vpc.aws_subnet.public[1],
│ on .terraform/modules/vpc/main.tf line 376, in resource "aws_subnet" "public":
│ 376: resource "aws_subnet" "public" {
A: For the ECS workload deployment a VPC is being created under the hood. Some AWS zones, such as the 'apne1-az3' in the 'ap-northeast' region does not support NATS, which is activated by default.
S: Specify the desired VPC region availability zones for the vpc module, using the ecs_vpc_region_azs
variable to explicit its desired value and workaround the error until AWS gives support for your region.
error while receiving the messages: error retrieving from S3 bucket=crit-start-trail: operation error S3: GetObject,
https response error StatusCode: 400, RequestID: ***, HostID: ***,
api error AuthorizationHeaderMalformed: The authorization header is malformed; a non-empty Access Key (AKID) must be provided in the credential."}
A: When the S3 bucket, where cloudtrail events are stored, is not in the same account as where the Cloud Connector workload is deployed, it requires the
use of the assumeRole
configuration.
This error happens when the ECS TaskRole
has no permissions to assume this role
S: Give permissions to sts:AssumeRole
to the role used.
A: Probably you or someone in the same environment you're using, already deployed a resource with the sysdig terraform module and a naming collision is happening.
S: If you want to maintain several versions, make use of the name
input var of the examples
A: There are several causes to this.
Check that your aws account has an alias set-up. It's not the same as the account name.
$ aws iam list-account-aliases
If all good, test deploy_benchmark
flag is enabled on your account, hence the trust-relationship is enabled between Sysdig and your cloud infrastructure.
In order to validate the trust-relationship expect no errows on following API.
$ curl -v https://<SYSDIG_SECURE_ENDPOINT>/api/cloud/v2/accounts/<AWS_ACCOUNT_ID>/validateRole \
--header 'Authorization: Bearer <SYSDIG_SECURE_API_TOKEN>'
Q-RuntimeThreat Detection: Getting error 403 "could not load rule set from Sysdig Secure: ruleprovider#newPartialRuleSet | error loading default-rules: error from Sysdig Secure API: 403
A: The Sysdig User that deployed the components is a standard user within the Sysdig Platform. Only administrator users are given permissions to read falco rule sets. Once this permission is changed, you should no longer get this error and CSPM Cloud events should start populating.
- Uninstall previous deployment resources before upgrading
$ terraform destroy
- Upgrade the full terraform example with
$ terraform init -upgrade
$ terraform plan
$ terraform apply
-
If the event-source is created throuh SFC, some events may get lost while upgrading with this approach. however, if the cloudtrail is re-used (normal production setup) events will be recovered once the ingestion resumes.
-
If required, you can upgrade cloud-connector component by restarting the task (stop task). Because it's not pinned to an specific version, it will download the
latest
one.
Module is maintained and supported by Sysdig.
Apache 2 Licensed. See LICENSE for full details.