Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-cdk: Ability to add metadata to assets and retain original source path of assets #27415

Open
1 of 2 tasks
sagar794 opened this issue Oct 5, 2023 · 3 comments
Open
1 of 2 tasks
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p3

Comments

@sagar794
Copy link

sagar794 commented Oct 5, 2023

Describe the feature

  1. The ability to add metadata to assets and have that metadata available for consumption post cdk synth
  2. For CDK to automatically add the original source file path regardless of whether the asset is staged.

Use Case

I want my CI system to validate the origin of the assets that were added to the cdk.out directory during the cdk synth execution. Today, when Assets are staged, they are copied over to cdk.out as part of cdk synth. Within the <stackId>.asset.json we can see the path to the staged asset, but not the original source path.

For example, when viewing this <stackId>.asset.json file I know the staged asset is named asset.97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip, but not which file on the local was used to create this asset.

{
  "version": "33.0.0",
  "files": {
    "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df": {
      "source": {
        "path": "asset.97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",   
        "packaging": "file"
      },
      "destinations": {
        "<aws-account-number>-<aws-region>": {
          "bucketName": "<bucket-name>",
          "objectKey": "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",
          "region": "<aws-region>"
        }
      }
    },
  "dockerImages": {}
}

The motivation for this feature is to be able to determine the origin of the asset using the <stackId>.assets.json file.

We also have created a function that allows us to download assets from a trusted external source, but would like to be able to audit the external source used to create the asset later.

Proposed Solution

If possible to safely add new keys to the <stackId>.asset.json files, then I would propose adding a new metadata key where metadata by CDK and custom metadata added by users could reside post cdk synth.

{
  "version": "33.0.0",
  "files": {
    "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df": {
      "source": {
        "path": "asset.97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",   
        "packaging": "file",
        "metadata": {
         "@aws-cdk/originalSourcePath": "relative/path/to/file",
         "someUserKey": "someUserValue"
        }
      },
      "destinations": {
        "<aws-account-number>-<aws-region>": {
          "bucketName": "<bucket-name>",
          "objectKey": "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",
          "region": "<aws-region>"
        }
      }
    },
  "dockerImages": {}
}

As for the original source path, I would expect that data to be added by CDK.

For lambda.fromAsset('path/to/file') I would expect "@aws-cdk/originalSourcePath": "path/to/file" to be in the metadata section. For constructs like NodejsFunction it would be a bit tricker since entry is not a required field in the props, but would like to know what value CDK resolved to for that. The same logic would follow for all other ways assets can get added with CDK.

Users should be able to add their own metadata to assets as well. This could possibly be done by adding a new metadata parameter to AssetProps for the Asset construct. All functions which create assets could then pass the metadata as an input. For my use case, this would be used to add the trusted external source URL that was used as metadata which could get audited later.

Alternatively, the original source path could be it's own key similar to path and packaging outside of the metadata section (i.e. originalSourcePath).

{
  "version": "33.0.0",
  "files": {
    "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df": {
      "source": {
        "path": "asset.97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",
        "originalSourcePath": "relative/path/to/file",   
        "packaging": "file",
        "metadata": {
         "someUserKey": "someUserValue"
        }
      },
      "destinations": {
        "<aws-account-number>-<aws-region>": {
          "bucketName": "<bucket-name>",
          "objectKey": "97c324c84f5d023be4edee540cb2cb401a49f115d01ed403b288f6cb412771df.zip",
          "region": "<aws-region>"
        }
      }
    },
  "dockerImages": {}
}

Other Information

This issue is similar, but not what I am looking for here since this is adding metadata to the CloudFormation template.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.93.0

Environment details (OS name and version, etc.)

macOS Ventura (13.6)

@sagar794 sagar794 added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Oct 5, 2023
@github-actions github-actions bot added the @aws-cdk/assets Related to the @aws-cdk/assets package label Oct 5, 2023
@sagar794 sagar794 changed the title (module name): (short issue description) aws-cdk: Ability to add metadata to assets and retain original source path of assets Oct 5, 2023
@evgenyka
Copy link
Contributor

evgenyka commented Oct 5, 2023

@sagar794 Could you please provide more details about what you mean by "validate the origin of the assets"? Are you looking to ensure that assets originate from a specific directory? Is this related to verifying artifact provenance? Perhaps it would be more sensible to generate a Software Bill of Materials (SBOM) for the CDK cloud assembly rather than relying on CDK metadata for this purpose?

@sagar794
Copy link
Author

sagar794 commented Oct 5, 2023

@evgenyka Sure thing! Thank you for taking the time to read through the issue.

Could you please provide more details about what you mean by "validate the origin of the assets"?

What I meant by "validate the origin of the assets" is that I would like to know whether the asset that was staged came from a trusted source or if it came from somewhere else. In order to do that, I would need to know what was the original file that was referenced. CDK will save the original path of the file if --no-staging is used during cdk synth, but not when staging assets. I would like to continue staging assets since the cdk.out directory becomes a single artifact needed for deployment later which is very convenient.

For example, during the CI stage of my CI/CD pipeline if I wanted to ensure assets that were staged were sourced from local files only, I would not be able to do that since external files could be downloaded within the CDK application and those files could then be added as assets. Since CDK does not store the referenced files when creating an asset there is no way to audit that later without managing a custom solution.

Are you looking to ensure that assets originate from a specific directory? Is this related to verifying artifact provenance?

Not a specific directory at the moment, but possibly could be. For now I am looking to verify that assets originate either from local files or from a trusted external source. When sourcing from the trusted external source I also need to retain the URL that was used to fetch the external asset (through a custom function). So downloading a file from the trusted external source in a way that bypasses the storage of the URL and using that as an asset would not be okay. This is where having the ability to associate additional metadata about assets would be nice to have.

Perhaps it would be more sensible to generate a Software Bill of Materials (SBOM) for the CDK cloud assembly rather than relying on CDK metadata for this purpose?

Apologies if I misunderstood you here, but I don't think this would help resolve my problem statement since the original asset paths would not be known today when staging assets.

@peterwoodworth
Copy link
Contributor

Another similar request here #27402, but your request seems to be more general for all assets rather than for one, and it would make sense to me to add metadata in the assets file regarding where assets came from, rather than just the template which may not cover all assets

@peterwoodworth peterwoodworth added p1 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Oct 5, 2023
@colifran colifran added p2 and removed p1 labels Feb 15, 2024
@pahud pahud added p3 and removed p2 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p3
Projects
None yet
Development

No branches or pull requests

5 participants