From 5f552d967f337d7da6c9fd1108a8eac05105dd94 Mon Sep 17 00:00:00 2001 From: Chanwit Kaewkasi Date: Wed, 20 Sep 2023 20:58:20 +0700 Subject: [PATCH 1/5] propose adr 0003 for workspace blob caching Signed-off-by: Chanwit Kaewkasi --- docs/adr/0003-workspace-blob-caching.md | 54 +++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 docs/adr/0003-workspace-blob-caching.md diff --git a/docs/adr/0003-workspace-blob-caching.md b/docs/adr/0003-workspace-blob-caching.md new file mode 100644 index 00000000..241560ca --- /dev/null +++ b/docs/adr/0003-workspace-blob-caching.md @@ -0,0 +1,54 @@ +# 3. Workspace BLOB Caching + +* Status: [ **proposed** | rejected | accepted | deprecated ] +* Date: 2023-09-20 +* Authors: @chanwit +* Deciders: TBD + +## Context + +The TF-Controller is being enhanced to address the resource deletion problem +more efficiently using the contents of generated Workspace BLOBs. +This ensures that Terraform finalization procedures are streamlined and efficient. +Currently, the TF-Controller downloads a Source BLOB and pushes it to a tf-runner. +The tf-runner then processes this BLOB to create a Workspace file system. +The tf-runner generates a backend configuration file, variable files, and other necessary files +for the Workspace file system. This newly created Workspace file system is then compressed, +sent back to the TF-Controller, and stored as a Workspace BLOB in the controller's storage. +A clear caching mechanism for these BLOBs is essential to ensure efficiency, security, +and ease of access. + +## Decision + +1. **BLOB Creation and Storage** + * A gRPC function named `CreateWorkspaceBlob` will be invoked by the TF-Controller + to compress the Workspace file system into a tar.gz format, which is then retrieved + as a byte array. + * The caching mechanism will be executed right before the Terraform Initialization step, ensuring that the latest and most relevant data is used. + * Each Workspace Blob will be cached on the TF-Controller's local disk, following the naming convention `$namespace-$name.tar.gz`. +2. **Persistence** + * The persistence mechanism used by the Source Controller will be adopted for the TF-Controller's persistence volume. +3. **BLOB Encryption** + * The encryption and decryption of the BLOBs will be tasked to the runner, with the controller solely responsible for storing encrypted BLOBs. + * Each namespace will require a service account, preferably named "tf-runner". + * The token of this service account, which is natively supported by Kubernetes, will serve as the most appropriate encryption key. +4. **Security Measures (Based on STRIDE Analysis)** + * **Spoofing:** Implement Kubernetes RBAC for access restrictions and use mutual authentication for gRPC communications. + * **Tampering:** Use checksums for integrity verification and 0600 permissions to write-protect local disk storage. + * **Repudiation:** Ensure strong logging and auditing mechanisms for tracking activities. + * **Information Disclosure:** Utilize robust encryption algorithms, rotate encryption keys periodically, and secure service account tokens. + * **Denial of Service:** Monitor storage space and automate cleanup processes. + * **Elevation of Privilege:** Minimize permissions associated with service account tokens. +5. **First MVP & Future Planning** + * For the initial MVP, the default pod local volume will be used. + * Since a controller restart will erase the BLOB cache, it's essential to maintain data integrity and availability. + Consideration for using persistent volumes should be made for subsequent versions. + +## Consequence + +1. With the implementation of this architecture: + * BLOB management in TF-Controller will be optimized, leading to a more efficient and streamlined Terraform finalization process. + * Security measures will ensure the safety of the stored BLOBs, minimizing potential threats. +2. Using the default pod local volume might limit storage capabilities and risk data loss upon controller restart. This warrants the need for considering persistent volumes in future versions. +3. Encryption and security measures will demand regular maintenance and monitoring, especially concerning key rotations and integrity checks. +4. Given the complexity of this setup, the importance of robust documentation, including troubleshooting and recovery processes, becomes apparent. From ae259caca7ea87d4bbe689503766f9ec8568b856 Mon Sep 17 00:00:00 2001 From: Chanwit Kaewkasi Date: Wed, 20 Sep 2023 21:30:41 +0700 Subject: [PATCH 2/5] addressed the cache deletion after the finalization process is complete Signed-off-by: Chanwit Kaewkasi --- docs/adr/0003-workspace-blob-caching.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/adr/0003-workspace-blob-caching.md b/docs/adr/0003-workspace-blob-caching.md index 241560ca..f760f2f8 100644 --- a/docs/adr/0003-workspace-blob-caching.md +++ b/docs/adr/0003-workspace-blob-caching.md @@ -25,7 +25,8 @@ and ease of access. to compress the Workspace file system into a tar.gz format, which is then retrieved as a byte array. * The caching mechanism will be executed right before the Terraform Initialization step, ensuring that the latest and most relevant data is used. - * Each Workspace Blob will be cached on the TF-Controller's local disk, following the naming convention `$namespace-$name.tar.gz`. + * Each Workspace Blob will be cached on the TF-Controller's local disk, using the UUID of the Terraform object as the filename,`${uuid}.tar.gz`. + * To prevent unauthorized access to the cache entries, and cache collisions, the cache file will be deleted after the finalization process is complete. 2. **Persistence** * The persistence mechanism used by the Source Controller will be adopted for the TF-Controller's persistence volume. 3. **BLOB Encryption** From 359aefd353f35a3359b7968901a975e75fc92d28 Mon Sep 17 00:00:00 2001 From: Chanwit Kaewkasi Date: Fri, 22 Sep 2023 00:10:20 +0700 Subject: [PATCH 3/5] clarify context --- docs/adr/0003-workspace-blob-caching.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/adr/0003-workspace-blob-caching.md b/docs/adr/0003-workspace-blob-caching.md index f760f2f8..519176e4 100644 --- a/docs/adr/0003-workspace-blob-caching.md +++ b/docs/adr/0003-workspace-blob-caching.md @@ -9,7 +9,16 @@ The TF-Controller is being enhanced to address the resource deletion problem more efficiently using the contents of generated Workspace BLOBs. -This ensures that Terraform finalization procedures are streamlined and efficient. + +The TF-Controller currently faces challenges related to the deletion of Terraform resources. +These problems span across three categories: +1. single resource deletion, +2. resources with dependencies deletion, and +3. namespace deletion. + +Implementing a robust Workspace BLOB caching mechanism is essential +for improving the reliability of the Terraform resource deletion process, for the single resource deletion scenarios. + Currently, the TF-Controller downloads a Source BLOB and pushes it to a tf-runner. The tf-runner then processes this BLOB to create a Workspace file system. The tf-runner generates a backend configuration file, variable files, and other necessary files @@ -48,7 +57,7 @@ and ease of access. ## Consequence 1. With the implementation of this architecture: - * BLOB management in TF-Controller will be optimized, leading to a more efficient and streamlined Terraform finalization process. + * The reliability of the Terraform resource deletion process will improved for the single resource deletion scenario. * Security measures will ensure the safety of the stored BLOBs, minimizing potential threats. 2. Using the default pod local volume might limit storage capabilities and risk data loss upon controller restart. This warrants the need for considering persistent volumes in future versions. 3. Encryption and security measures will demand regular maintenance and monitoring, especially concerning key rotations and integrity checks. From 648c2582a8ca9733a4142f50b90b3e290f126161 Mon Sep 17 00:00:00 2001 From: Chanwit Kaewkasi Date: Mon, 9 Oct 2023 19:43:02 +0700 Subject: [PATCH 4/5] explain problem and user impact Signed-off-by: Chanwit Kaewkasi --- docs/adr/0003-workspace-blob-caching.md | 29 ++++++++++--------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/docs/adr/0003-workspace-blob-caching.md b/docs/adr/0003-workspace-blob-caching.md index 519176e4..da2d33df 100644 --- a/docs/adr/0003-workspace-blob-caching.md +++ b/docs/adr/0003-workspace-blob-caching.md @@ -7,25 +7,20 @@ ## Context -The TF-Controller is being enhanced to address the resource deletion problem -more efficiently using the contents of generated Workspace BLOBs. - -The TF-Controller currently faces challenges related to the deletion of Terraform resources. +The TF-Controller currently faces challenges related to the deletion of Terraform resources. These problems span across three categories: -1. single resource deletion, -2. resources with dependencies deletion, and -3. namespace deletion. -Implementing a robust Workspace BLOB caching mechanism is essential -for improving the reliability of the Terraform resource deletion process, for the single resource deletion scenarios. +1. Single object deletion, +2. Resources with dependencies deletion, and +3. Namespace deletion. + +These problems must be fixed in the above order as (2) and (3) require single object deletion to be resolved first. + +Deleting a single TF object can sometimes be obstructed because it's tied to other resources like Source objects, Secrets, and ConfigMaps. If we try to remove it without deleting these resources, the TF object gets stuck in an inconsistent state, making it harder for users to manage their infrastructure smoothly. +Therefore, the TF-Controller is being enhanced to address this problem more efficiently, using the contents of generated Workspace BLOBs. Each BLOB contains all necessary information from the associated Source, Secrets, and ConfigMaps to ensure that TF-Controller finalization procedures can delete objects correctly. -Currently, the TF-Controller downloads a Source BLOB and pushes it to a tf-runner. -The tf-runner then processes this BLOB to create a Workspace file system. -The tf-runner generates a backend configuration file, variable files, and other necessary files -for the Workspace file system. This newly created Workspace file system is then compressed, -sent back to the TF-Controller, and stored as a Workspace BLOB in the controller's storage. -A clear caching mechanism for these BLOBs is essential to ensure efficiency, security, -and ease of access. +Currently, the TF-Controller downloads a Source BLOB and pushes it to a tf-runner. The tf-runner processes this BLOB to create a Workspace file system. It generates a backend configuration file, variable files, and other necessary files for the Workspace file system, using data from associated Secrets and ConfigMaps. This newly created Workspace file system is then compressed, sent back to the TF-Controller, and stored as a Workspace BLOB in the controller's storage. +A caching mechanism for these BLOBs is essential to fixing the single TF object deletion process. ## Decision @@ -57,7 +52,7 @@ and ease of access. ## Consequence 1. With the implementation of this architecture: - * The reliability of the Terraform resource deletion process will improved for the single resource deletion scenario. + * The reliability of the Terraform resource deletion process will improved for the single object deletion scenario. * Security measures will ensure the safety of the stored BLOBs, minimizing potential threats. 2. Using the default pod local volume might limit storage capabilities and risk data loss upon controller restart. This warrants the need for considering persistent volumes in future versions. 3. Encryption and security measures will demand regular maintenance and monitoring, especially concerning key rotations and integrity checks. From fac76f96a0e9f050821c637d95acbdf6ef4e6d4d Mon Sep 17 00:00:00 2001 From: Chanwit Kaewkasi Date: Mon, 9 Oct 2023 20:00:52 +0700 Subject: [PATCH 5/5] fix wordings and add link per comments Signed-off-by: Chanwit Kaewkasi --- docs/adr/0003-workspace-blob-caching.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/adr/0003-workspace-blob-caching.md b/docs/adr/0003-workspace-blob-caching.md index da2d33df..8259c125 100644 --- a/docs/adr/0003-workspace-blob-caching.md +++ b/docs/adr/0003-workspace-blob-caching.md @@ -26,17 +26,16 @@ A caching mechanism for these BLOBs is essential to fixing the single TF object 1. **BLOB Creation and Storage** * A gRPC function named `CreateWorkspaceBlob` will be invoked by the TF-Controller - to compress the Workspace file system into a tar.gz format, which is then retrieved - as a byte array. + to tell tf-runner to compress the Workspace file system into a tar.gz BLOB, which is then retrieved back to the controller. * The caching mechanism will be executed right before the Terraform Initialization step, ensuring that the latest and most relevant data is used. * Each Workspace Blob will be cached on the TF-Controller's local disk, using the UUID of the Terraform object as the filename,`${uuid}.tar.gz`. - * To prevent unauthorized access to the cache entries, and cache collisions, the cache file will be deleted after the finalization process is complete. + * To reduce the risk of unauthorized access to the cache entries, and cache collisions, the cache file will be deleted after the finalization process is complete. 2. **Persistence** - * The persistence mechanism used by the Source Controller will be adopted for the TF-Controller's persistence volume. + * [The persistence mechanism used by the Source Controller](https://fluxcd.io/flux/installation/configuration/vertical-scaling/#persistent-storage-for-flux-internal-artifacts) will be adopted for the TF-Controller's persistence volume. 3. **BLOB Encryption** * The encryption and decryption of the BLOBs will be tasked to the runner, with the controller solely responsible for storing encrypted BLOBs. * Each namespace will require a service account, preferably named "tf-runner". - * The token of this service account, which is natively supported by Kubernetes, will serve as the most appropriate encryption key. + * The token of this service account, which is natively supported by Kubernetes, will serve as the most appropriate encryption key because it's stored in a Secret, access to which can be controlled by RBAC. Storing it in a Secret also allows the key to be rotated. 4. **Security Measures (Based on STRIDE Analysis)** * **Spoofing:** Implement Kubernetes RBAC for access restrictions and use mutual authentication for gRPC communications. * **Tampering:** Use checksums for integrity verification and 0600 permissions to write-protect local disk storage. @@ -46,14 +45,12 @@ A caching mechanism for these BLOBs is essential to fixing the single TF object * **Elevation of Privilege:** Minimize permissions associated with service account tokens. 5. **First MVP & Future Planning** * For the initial MVP, the default pod local volume will be used. - * Since a controller restart will erase the BLOB cache, it's essential to maintain data integrity and availability. - Consideration for using persistent volumes should be made for subsequent versions. + * Since a controller restart will erase the BLOB cache, consideration for using persistent volumes should be made for subsequent versions. ## Consequence 1. With the implementation of this architecture: - * The reliability of the Terraform resource deletion process will improved for the single object deletion scenario. - * Security measures will ensure the safety of the stored BLOBs, minimizing potential threats. + * Single object deletions will succeed in circumstances in which they previously got stuck. + * Security measures will ensure the safety of the new Workspace BLOB storage mechanics, minimizing potential risks. 2. Using the default pod local volume might limit storage capabilities and risk data loss upon controller restart. This warrants the need for considering persistent volumes in future versions. 3. Encryption and security measures will demand regular maintenance and monitoring, especially concerning key rotations and integrity checks. -4. Given the complexity of this setup, the importance of robust documentation, including troubleshooting and recovery processes, becomes apparent.