ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

sbkok · 2022-09-01T17:12:51Z

Problem

CodeBuild fails directly after creating a new account.

Errors

Error 1

Failed to create the change set for adf-global-base-development | (cloudformation.py:282)

Error 2

Failed to update its base stack due to missing parameters (deployment_account_id or kms_arn), ensure this account has been bootstrapped correctly by being moved from the root into an Organizational Unit within AWS Organizations

Error 3

2022-09-01 15:28:21,617 | DEBUG | s3 | Nothing could be found for regional.yml when traversing the bucket | (s3.py:201)

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 277, in _create_change_set
self.client.create_change_set(**change_set_params)
File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 915, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ValidationError) when calling the CreateChangeSet operation: Unable to fetch parameters [deployment_account_id] from parameter store for this account.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/main.py", line 235, in worker_thread
cloudformation.create_stack()
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 390, in create_stack
create_change_set = self._create_change_set()
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 290, in _create_change_set
raise GenericAccountConfigureError(error) from error
errors.GenericAccountConfigureError: An error occurred (ValidationError) when calling the CreateChangeSet operation: Unable to fetch parameters [deployment_account_id] from parameter store for this account.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/main.py", line 381, in
main()
File "/codebuild/output/src013183384/src/adf-build/main.py", line 357, in main
thread.join()
File "/codebuild/output/src013183384/src/adf-build/shared/python/thread.py", line 30, in join
raise self.exc
File "/codebuild/output/src013183384/src/adf-build/shared/python/thread.py", line 20, in run
self.ret = self._target(
File "/codebuild/output/src013183384/src/adf-build/main.py", line 248, in worker_thread
raise Exception from error
Exception

[Container] 2022/09/01 15:28:21 Command did not exit successfully python adf-build/main.py exit status 1
[Container] 2022/09/01 15:28:21 Phase complete: BUILD State: FAILED
[Container] 2022/09/01 15:28:21 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: python adf-build/main.py. Reason: exit status 1

Workaround

A possible workaround is to wait a while after the ADF-Bootstrap pipeline failed in the management account and release a new change.

Plan to fix

Insert the ADF-Bootstrap CodeBuild execution id inside the meta-data for the ADF Account files uploaded to S3.
In the AccountFileProcessorFunction function, make it retrieve that meta-data and insert it in the Step Function execution ids, like: “${full-account-name}-${exec-id}”. At the moment this is a GUID.
In the CodeBuild step, inside the main.py file, make it wait while there is a Step Function in progress.
In the CodeBuild step, inside the main.py file, make it fail if one of the Step Function executions failed. Link the failed Step Function execution.

Please note: I am working on the above plan to fix this.

**Why?** By using `aws s3 sync/cp`, it would copy the files when these were changed. However, as the file metadata is also taken into account, it would upload also if the content did not change. Additionally, as described in awslabs#518, we would like to insert metadata when a file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also update the metadata if the metadata itself is changed. Therefore, we cannot add the necessary execution id to the files upon an upload only. **What?** The `sync_to_s3.py` script is added to support syncing the files to S3. This script will: 1. Upload a single file, or walk through a directory recursively. 2. Check each of the files it finds and determines the SHA-256 hash of these. 3. Parse the S3 bucket with an optional prefix, to determine which objects exist. 4. If a file is missing, it will upload the file. 5. If a file exists as an object already, it will check if the SHA-256 hashes match. If they do not, it will upload the new version. 6. If an object exists, but the file does not exist, it will optionally delete the object from the S3 bucket. When it uploads a file to S3, it will add the metadata that is requested through the `--upload-with-metadata` argument. Additionally, it will add the `sha256_hash` metadata to determine if the content changed. The deployment maps and account configuration processes rely on AWS Step Functions. When these are synced, the process is updated to rely on the `sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert that in the invocation id of the Step Function State Machine.

… metadata (#530) * Sync SFn input files when content changed only with exec id metadata **Why?** By using `aws s3 sync/cp`, it would copy the files when these were changed. However, as the file metadata is also taken into account, it would upload also if the content did not change. Additionally, as described in #518, we would like to insert metadata when a file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also update the metadata if the metadata itself is changed. Therefore, we cannot add the necessary execution id to the files upon an upload only. **What?** The `sync_to_s3.py` script is added to support syncing the files to S3. This script will: 1. Upload a single file, or walk through a directory recursively. 2. Check each of the files it finds and determines the SHA-256 hash of these. 3. Parse the S3 bucket with an optional prefix, to determine which objects exist. 4. If a file is missing, it will upload the file. 5. If a file exists as an object already, it will check if the SHA-256 hashes match. If they do not, it will upload the new version. 6. If an object exists, but the file does not exist, it will optionally delete the object from the S3 bucket. When it uploads a file to S3, it will add the metadata that is requested through the `--upload-with-metadata` argument. Additionally, it will add the `sha256_hash` metadata to determine if the content changed. The deployment maps and account configuration processes rely on AWS Step Functions. When these are synced, the process is updated to rely on the `sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert that in the invocation id of the Step Function State Machine. * Move sync_to_s3 to shared helpers * Fix linting issues * Add helper requirements * Code review changes * Add support for syncing multiple file extensions with sync_to_s3.py **Why?** To support matching both .yml and .yaml file extensions. **What?** Support added to pass multiple -e or --extension arguments. * Add CodePipeline Execution Id to accounts & pipeline gen * Add ADF Version to S3 file sync, such that an ADF update triggers updates **Why?** When files are synced to S3, they only triggered an update of the account management or pipeline generator when the file content changed. If ADF made changes to the pipeline structure, the pipelines and account management should be retriggered to apply them. **What?** By adding the `adf_version` metadata to the files that are synced, we can ensure that we only trigger an update to the file when the version is updated.

Step 2 of fixing awslabs#518. **Why?** As explained in awslabs#518, we need to forward the execution id of the CodePipeline that triggered the Account Management state machine so we can wait for to complete. **What?** Adding the CodePipeline execution identifier to the Step Functions State Machine invocation to enable tracing state machines in progress at CodeBuild execution time.

**Why?** As described in issue awslabs#518, the bootstrap pipeline fails to perform the main.py code when the account creation or bootstrapping process is still in progress. **What?** The code changes ensure the script will wait for any Step Function executions that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in a loop until they succeeded.

* Forward CodePipeline Execution Id to Account Mgmt Creation SFN Step 2 of fixing #518. **Why?** As explained in #518, we need to forward the execution id of the CodePipeline that triggered the Account Management state machine so we can wait for to complete. **What?** Adding the CodePipeline execution identifier to the Step Functions State Machine invocation to enable tracing state machines in progress at CodeBuild execution time. * Only process account and deployment maps generated by same ADF version **Why?** When an account file or deployment map is updated, it should only be processed when the version equals the current ADF version. Otherwise it should skip the file. This will ensure we don't get into compatibility issues, where a file structure update will make the processing of the file fail. **What?** Checking the version number attached to the S3 object metadata against the current ADF version number. If those mismatch, it will skip the file. * Fix Doc to use arguments * Limit SFN Execution id to 80 chars **Why?** SFN has a limit of 80 chars. * Await SFN executions before bootstrapping continues **Why?** As described in issue #518, the bootstrap pipeline fails to perform the main.py code when the account creation or bootstrapping process is still in progress. **What?** The code changes ensure the script will wait for any Step Function executions that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in a loop until they succeeded. * Abort bootstrap pipeline when SFN error occurred **Why?** As the account management and bootstrapping steps are performed in Step Function State Machines, the errors might not be noticed until a follow-up error occurs when trying to interact with one of the failing accounts. **What?** Modified the bootstrap pipeline to check if these state machines do not have any aborted, timed out, or failed executions. If they do, it will log the error and instruct the user to look into the fault first. * Use CodePipeline execution id instead of CodeBuild for sync_to_s3 ops * Install helper requirements as part of the bootstrap pipeline * Catch key not found in sync_to_s3 helper **Why?** The implementation of the head_object throws a low-level `botocore.exceptions.ClientError` instead of the higher level `s3_client.exceptions.NoSuchKey`. **What?** Properly caught the error using the low-level approach.

sbkok · 2023-01-24T10:26:23Z

Thank you for your patience. I am happy to inform you that this issue has been resolved in our latest release v3.2.0 just now.
I'm hereby closing this issue. Please open a new issue if you are experiencing any issues with the latest release.

sbkok added the bug Something isn't working label Sep 1, 2022

sbkok added this to the v3.2.0 milestone Sep 1, 2022

sbkok self-assigned this Sep 1, 2022

sbkok mentioned this issue Sep 14, 2022

Sync Step Function input files when content changed only with exec id metadata #530

Merged

sbkok mentioned this issue Sep 26, 2022

Fix account creation wait for bootstrap to complete #537

Merged

sbkok closed this as completed Jan 24, 2023

jdhakar1995 mentioned this issue Jun 19, 2024

adf-bootstrap code pipeline fails in management account #741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

sbkok commented Sep 1, 2022 •

edited

Loading

sbkok commented Jan 24, 2023

ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

Comments

sbkok commented Sep 1, 2022 • edited Loading

Problem

Errors

Error 1

Error 2

Error 3

Workaround

Plan to fix

sbkok commented Jan 24, 2023

sbkok commented Sep 1, 2022 •

edited

Loading