Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: adf-account-bootstrapping SFN fails because of throtteling #752

Open
1 of 2 tasks
gottkanzler-rgb opened this issue Aug 9, 2024 · 1 comment
Open
1 of 2 tasks
Labels
bug Something isn't working

Comments

@gottkanzler-rgb
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

When bootstrapping multiple (e.g. 3-5) AWS accounts by using the ADF account creation mechanic, the Stepfunction adf-account-bootstrapping fails to bootstrap the accounts, since the Lambda adf-bootstrapping-jump-role-manager is performing too many read operations on the Organizations service resulting in TooManyRequestsException.
Subsequently bootstrapping fails and must be re-triggered manually multiple times until it eventually succeeds.

Expected Behavior

Bootstrapping of accounts by SFN adf-account-bootstrapping should be working without error or manual re-triggers.

Current Behavior

Bootstrapping multiple accounts at once results in the following error in the adf-account-bootstrapping SFN:

{
"error": "Task failed. Granting the ADF Account-Bootstrapping Jump Role privileged cross-account access failed due to an error: An error occurred (TooManyRequestsException) when calling the ListParents operation (reached max retries: 4): AWS Organizations can't complete your request because another request is already in progress. Try again later.."
}

Steps To Reproduce

  1. Have a relatively large AWS organization (in our case ~500 accounts)
  2. add 3-5 Accounts to the definition file for ADF account provisioning in the aws-deployment-framework-bootstrap repository
  3. wait until aws-deployment-framework-bootstrap-pipeline triggers adf-account-bootstrapping SFN
  4. SFN will fail due to described problem

Possible Solution

  • Implement error handling and retry mechanic for TooManyRequestsException
  • reduce amount of read operations for Organizations service during adf-bootstrapping-jump-role-manager Lambda execution (as I could see from the logs, the whole organization is traversed in each SFN execution which is only related to a single account

Additional Information/Context

No response

ADF Version

4.0.0

Contributing a fix?

  • Yes, I am working on a fix to resolve this issue
@gottkanzler-rgb gottkanzler-rgb added the bug Something isn't working label Aug 9, 2024
@ethanBaird
Copy link

Seen a similar issue to this today. When removing an account from the adf-accounts/*.yml a sfn execution is triggered for all account in that file. If there are 5+ accounts this will fail due to a TooManyRequestsException on the ListRegions call in the ConfigureAccountRegions step.

There is an account level limit on this, capped at 5 calls, and is not updatable.

You can find this in Service Quotas under AWS Account Management

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants