Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADF-Bootstrap CodeBuild fails directly after creating a new account using adf-accounts logic #518

Closed
4 tasks done
sbkok opened this issue Sep 1, 2022 · 1 comment
Closed
4 tasks done
Assignees
Labels
bug Something isn't working
Milestone

Comments

@sbkok
Copy link
Collaborator

sbkok commented Sep 1, 2022

Problem

CodeBuild fails directly after creating a new account.

Errors

Error 1

Failed to create the change set for adf-global-base-development | (cloudformation.py:282)

Error 2

Failed to update its base stack due to missing parameters (deployment_account_id or kms_arn), ensure this account has been bootstrapped correctly by being moved from the root into an Organizational Unit within AWS Organizations

Error 3

2022-09-01 15:28:21,617 | DEBUG | s3 | Nothing could be found for regional.yml when traversing the bucket | (s3.py:201)

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 277, in _create_change_set
self.client.create_change_set(**change_set_params)
File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/botocore/client.py", line 915, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ValidationError) when calling the CreateChangeSet operation: Unable to fetch parameters [deployment_account_id] from parameter store for this account.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/main.py", line 235, in worker_thread
cloudformation.create_stack()
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 390, in create_stack
create_change_set = self._create_change_set()
File "/codebuild/output/src013183384/src/adf-build/shared/python/cloudformation.py", line 290, in _create_change_set
raise GenericAccountConfigureError(error) from error
errors.GenericAccountConfigureError: An error occurred (ValidationError) when calling the CreateChangeSet operation: Unable to fetch parameters [deployment_account_id] from parameter store for this account.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/codebuild/output/src013183384/src/adf-build/main.py", line 381, in
main()
File "/codebuild/output/src013183384/src/adf-build/main.py", line 357, in main
thread.join()
File "/codebuild/output/src013183384/src/adf-build/shared/python/thread.py", line 30, in join
raise self.exc
File "/codebuild/output/src013183384/src/adf-build/shared/python/thread.py", line 20, in run
self.ret = self._target(
File "/codebuild/output/src013183384/src/adf-build/main.py", line 248, in worker_thread
raise Exception from error
Exception

[Container] 2022/09/01 15:28:21 Command did not exit successfully python adf-build/main.py exit status 1
[Container] 2022/09/01 15:28:21 Phase complete: BUILD State: FAILED
[Container] 2022/09/01 15:28:21 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: python adf-build/main.py. Reason: exit status 1

Workaround

A possible workaround is to wait a while after the ADF-Bootstrap pipeline failed in the management account and release a new change.

Plan to fix

  • Insert the ADF-Bootstrap CodeBuild execution id inside the meta-data for the ADF Account files uploaded to S3.
  • In the AccountFileProcessorFunction function, make it retrieve that meta-data and insert it in the Step Function execution ids, like: “${full-account-name}-${exec-id}”. At the moment this is a GUID.
  • In the CodeBuild step, inside the main.py file, make it wait while there is a Step Function in progress.
  • In the CodeBuild step, inside the main.py file, make it fail if one of the Step Function executions failed. Link the failed Step Function execution.

Please note: I am working on the above plan to fix this.

@sbkok sbkok added the bug Something isn't working label Sep 1, 2022
@sbkok sbkok added this to the v3.2.0 milestone Sep 1, 2022
@sbkok sbkok self-assigned this Sep 1, 2022
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 14, 2022
**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in awslabs#518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 14, 2022
**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in awslabs#518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 14, 2022
**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in awslabs#518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 19, 2022
**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in awslabs#518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.
sbkok added a commit that referenced this issue Sep 20, 2022
… metadata (#530)

* Sync SFn input files when content changed only with exec id metadata

**Why?**

By using `aws s3 sync/cp`, it would copy the files when these were changed.
However, as the file metadata is also taken into account, it would upload
also if the content did not change.

Additionally, as described in #518, we would like to insert metadata when a
file is changed. If we would rely on the `aws s3 sync/cp` logic, it will also
update the metadata if the metadata itself is changed. Therefore, we cannot
add the necessary execution id to the files upon an upload only.

**What?**

The `sync_to_s3.py` script is added to support syncing the files to S3.
This script will:
1. Upload a single file, or walk through a directory recursively.
2. Check each of the files it finds and determines the SHA-256 hash of these.
3. Parse the S3 bucket with an optional prefix, to determine which objects
   exist.
4. If a file is missing, it will upload the file.
5. If a file exists as an object already, it will check if the SHA-256 hashes
   match. If they do not, it will upload the new version.
6. If an object exists, but the file does not exist, it will optionally delete
   the object from the S3 bucket.

When it uploads a file to S3, it will add the metadata that is requested
through the `--upload-with-metadata` argument. Additionally, it will add the
`sha256_hash` metadata to determine if the content changed.

The deployment maps and account configuration processes rely on AWS Step
Functions. When these are synced, the process is updated to rely on the
`sync_to_s3.py` script. This way we can retrieve the `execution_id` and insert
that in the invocation id of the Step Function State Machine.

* Move sync_to_s3 to shared helpers

* Fix linting issues

* Add helper requirements

* Code review changes

* Add support for syncing multiple file extensions with sync_to_s3.py

**Why?**

To support matching both .yml and .yaml file extensions.

**What?**

Support added to pass multiple -e or --extension arguments.

* Add CodePipeline Execution Id to accounts & pipeline gen

* Add ADF Version to S3 file sync, such that an ADF update triggers updates

**Why?**

When files are synced to S3, they only triggered an update of the account
management or pipeline generator when the file content changed.

If ADF made changes to the pipeline structure, the pipelines and account
management should be retriggered to apply them.

**What?**

By adding the `adf_version` metadata to the files that are synced, we can
ensure that we only trigger an update to the file when the version is updated.
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 26, 2022
Step 2 of fixing awslabs#518.

**Why?**

As explained in awslabs#518, we need to forward the execution id of the CodePipeline
that triggered the Account Management state machine so we can wait for to
complete.

**What?**

Adding the CodePipeline execution identifier to the Step Functions
State Machine invocation to enable tracing state machines in progress at
CodeBuild execution time.
sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Sep 26, 2022
**Why?**

As described in issue awslabs#518, the bootstrap pipeline fails to perform the main.py
code when the account creation or bootstrapping process is still in progress.

**What?**

The code changes ensure the script will wait for any Step Function executions
that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in
a loop until they succeeded.
StewartW pushed a commit that referenced this issue Oct 3, 2022
* Forward CodePipeline Execution Id to Account Mgmt Creation SFN

Step 2 of fixing #518.

**Why?**

As explained in #518, we need to forward the execution id of the CodePipeline
that triggered the Account Management state machine so we can wait for to
complete.

**What?**

Adding the CodePipeline execution identifier to the Step Functions
State Machine invocation to enable tracing state machines in progress at
CodeBuild execution time.

* Only process account and deployment maps generated by same ADF version

**Why?**

When an account file or deployment map is updated, it should only be processed
when the version equals the current ADF version. Otherwise it should skip the
file.

This will ensure we don't get into compatibility issues, where a file structure
update will make the processing of the file fail.

**What?**

Checking the version number attached to the S3 object metadata against the
current ADF version number. If those mismatch, it will skip the file.

* Fix Doc to use arguments

* Limit SFN Execution id to 80 chars

**Why?**

SFN has a limit of 80 chars.

* Await SFN executions before bootstrapping continues

**Why?**

As described in issue #518, the bootstrap pipeline fails to perform the main.py
code when the account creation or bootstrapping process is still in progress.

**What?**

The code changes ensure the script will wait for any Step Function executions
that are triggered by the sync_to_s3.py process. It will wait for 30 seconds in
a loop until they succeeded.

* Abort bootstrap pipeline when SFN error occurred

**Why?**

As the account management and bootstrapping steps are performed in Step
Function State Machines, the errors might not be noticed until a follow-up error
occurs when trying to interact with one of the failing accounts.

**What?**

Modified the bootstrap pipeline to check if these state machines do not have
any aborted, timed out, or failed executions. If they do, it will log the error
and instruct the user to look into the fault first.

* Use CodePipeline execution id instead of CodeBuild for sync_to_s3 ops

* Install helper requirements as part of the bootstrap pipeline

* Catch key not found in sync_to_s3 helper

**Why?**

The implementation of the head_object throws a low-level
`botocore.exceptions.ClientError` instead of
the higher level `s3_client.exceptions.NoSuchKey`.

**What?**

Properly caught the error using the low-level approach.
@sbkok
Copy link
Collaborator Author

sbkok commented Jan 24, 2023

Thank you for your patience. I am happy to inform you that this issue has been resolved in our latest release v3.2.0 just now.
I'm hereby closing this issue. Please open a new issue if you are experiencing any issues with the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant