Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the AWS main organizations stackset on an account with a pre-existing AWS integration will delete it #124

Open
j4mcs opened this issue Jun 24, 2024 · 2 comments

Comments

@j4mcs
Copy link

j4mcs commented Jun 24, 2024

Expected Behavior

When installing the AWS integration into an account which already has been registered in Datadog, the integration should either fail and leave the existing registration unchanged or succeed and update the existing registration with the configuration passed to the integration.

Actual Behavior

When installing the AWS integration into an account which has already been registered in Datadog, the integration will fail and then delete the pre-existing registration

We are encountering this as we have the V1 Datadog integration configured for our older accounts but would like to use the V2 integration for all new accounts. This would require us to load the stackset for all OUs instead of manually per new account. However to do this we need a way for stack instances run in existing accounts to not fail (and rollback resources it didn't create).

Steps to Reproduce the Problem

This issue is related to #85 but highlights broader problem with how the integration Lambda function is written

  1. Add an AWS account to datadog (manually or otherwise)
  2. Run the main organizations stackset (MainDatadogStackV2)
  3. Observe that the stackset fails with a 409 Multiregion stackset deployment fails #85
  4. After the failure, cloudformation will rollback and delete the AWS integration created in 1.

Specifications

Stacktrace

Here are the relevant cloudwatch logs

[INFO]	2024-06-17T10:08:44.819Z	3b59cfd9-7102-4308-9130-b219a7876f06	Received Create request.
[INFO]	2024-06-17T10:08:45.349Z	3b59cfd9-7102-4308-9130-b219a7876f06	Failed - exception thrown during processing.
[INFO]	2024-06-17T10:08:45.350Z	3b59cfd9-7102-4308-9130-b219a7876f06	ResponseBody: 
{
  "Status": "FAILED",
  "Reason": "See the details in CloudWatch Log Stream: 2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
  "PhysicalResourceId": "2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
  "StackId": "arn:aws:cloudformation:us-east-2:XXXXXXXXXXX:stack/StackSet-MainDatadogStackV2-0528f68a-e780-48bd-bde1-1108f98fea9a/83fcaf30-2c91-11ef-adc8-06cd71ccd097",
  "RequestId": "2ea31534-a1d3-4107-ba51-9ca64f57ce2d",
  "LogicalResourceId": "DatadogAPICall",
  "Data": {
      "Message": "Exception during processing: HTTP Error 409: Conflict"
  }
}

...

[INFO]	2024-06-17T10:08:49.548Z	a75026eb-cdf8-4481-86e3-8320fa38325c	Received Delete request.
[INFO]	2024-06-17T10:08:50.667Z	a75026eb-cdf8-4481-86e3-8320fa38325c	ResponseBody: 
{
  "Status": "SUCCESS",
  "Reason": "See the details in CloudWatch Log Stream: 2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
  "PhysicalResourceId": "2024/06/17/[$LATEST]34b69cf01182440f9ef41373c5a7f766",
  "StackId": "arn:aws:cloudformation:us-east-2:XXXXXXXXXXX:stack/StackSet-MainDatadogStackV2-0528f68a-e780-48bd-bde1-1108f98fea9a/83fcaf30-2c91-11ef-adc8-06cd71ccd097",
  "RequestId": "8b494c1f-7cb8-423c-ab04-3413300bca10",
  "LogicalResourceId": "DatadogAPICall",
  "Data": {
      "Message": "Datadog AWS Integration deleted successfully."
  }
}
@j4mcs
Copy link
Author

j4mcs commented Jun 24, 2024

A bit more context on what is happening in your lambda code: When POST calls to https://api.datadoghq.com/api/v1/integration/aws result in a 409, the response is an error object. This gets caught as an exception by the lambda handler and returned as a FAILED response. Given a GET won't return the external ID I think CREATE requests should list the existing AWS integrations for the given account_id and if there are any request a new ExternalID with the supplied configuration and return SUCCESS

@dheitman-prom
Copy link

Big ol' plus one on this, I've been fighting this for a couple weeks. The stackSet has major issues that I'm still trying to sort out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants