Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws payload tagging #4309

Merged
merged 4 commits into from
Sep 3, 2024
Merged

aws payload tagging #4309

merged 4 commits into from
Sep 3, 2024

Conversation

tlhunter
Copy link
Member

@tlhunter tlhunter commented May 15, 2024

This PR rebuilds #4131. It removes hundreds of files worth of whitespace changes and rebuilds yarn.lock based on current master branch. Ultimately @jbertran will have done 90% of the work in this PR.


What does this PR do?

  • This PR introduces AWS payload reporting as tags. The feature is entirely locked behind environment variables
  • I'd like to release this without documenting the feature / env vars, then have the Serverless team test it
  • Once they're happy with it and any data format changes are made then we'd document the variables

Configuration

We introduce 3 new environment variables:

  • DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING defines the activation of the feature for requests, values being either "all" (no additional redactionor a comma-separated list of JSONPath queries identifying payload paths to be replaced with the value"redacted"`.
  • DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING
  • DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_DEPTH sets the depth after which we stop creating tags from a payload

Behaviour

With the feature activated, aws-sdk calls to the enabled plugins will create additional tags representing the payload, with the following modifications:

  1. Paths known to be PII/sensitive are hard-coded to be redacted (service by service)
  2. Paths known to be user-input data likely to contain JSON are expanded
  3. Paths matching the JSONPath queries passed by the environment variables or corresponding runtime tracer configuration are redacted

This PR only provides the feature for SNS as a first service, but the framework introduced here only requires slight adaptations of a given AWS service plugin to make it available, as well as the addition of the static PII fields configuration.

New dependencies

Adding jsonpath seems safe given the constraints it imposes on its scripts, even if I don't expect scripts to be used. Using rfdc is more questionable - we need a deep clone because JSONPath apply can only do side-effects, and we must not modify the payload, but maybe something simpler works.

Remaining work

In some cases, JSONPath filter expressions are not sufficient to do what we want.

For example, setting attributes for entities (like SNS topics) requires setting an AttributeName and an AttributeValue at top-level of the JSON payload. Ideally, we should be able to redact the AttributeValue only when the AttributeName matches a disallowed value (for example KMSMasterKeyId). JSONPath syntax does not allow such a complex query, so we need to also specify custom logic hooks that do not go through JSONPath to redact data.

Motivation

This come from:

  1. the desire to have real-world data correlated with traces
  2. the fact that AWS upstream API is well-defined and well-documented, helping us avoid PII/sensitive data pitfalls
  3. the existence of such a mechanism in datadog-lambda-js, but only scoped to lambda function input and output. This provides the same level of information, with additional redaction granularity, for AWS plugins.

Plugin Checklist

Additional Notes

Security

Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@tlhunter tlhunter requested review from a team as code owners May 15, 2024 18:57
@tlhunter tlhunter marked this pull request as draft May 15, 2024 18:57
Copy link

github-actions bot commented May 15, 2024

Overall package size

Self size: 7.03 MB
Deduped: 62.39 MB
No deduping: 62.67 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/native-appsec | 8.1.1 | 18.67 MB | 18.68 MB | | @datadog/native-iast-taint-tracking | 3.1.0 | 12.27 MB | 12.28 MB | | @datadog/pprof | 5.3.0 | 9.85 MB | 10.22 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.4.1 | 2.14 MB | 2.23 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 2.0.0 | 898.77 kB | 1.3 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 9.0.0 | 580.4 kB | 1.03 MB | | import-in-the-middle | 1.8.1 | 71.67 kB | 785.15 kB | | msgpack-lite | 0.1.26 | 201.16 kB | 281.59 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | lru-cache | 7.14.0 | 74.95 kB | 74.95 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | int64-buffer | 0.1.10 | 49.18 kB | 49.18 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | path-to-regexp | 0.1.7 | 6.78 kB | 6.78 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch from 988466f to 2b87d75 Compare May 30, 2024 17:49
Copy link

codecov bot commented May 30, 2024

Codecov Report

Attention: Patch coverage is 96.34146% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.52%. Comparing base (a34a685) to head (c335ac8).
Report is 7 commits behind head on master.

Files Patch % Lines
packages/dd-trace/src/payload-tagging/index.js 87.50% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4309      +/-   ##
==========================================
- Coverage   95.85%   86.52%   -9.33%     
==========================================
  Files         105      253     +148     
  Lines        3451    10925    +7474     
  Branches       33       33              
==========================================
+ Hits         3308     9453    +6145     
- Misses        143     1472    +1329     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tlhunter tlhunter mentioned this pull request May 31, 2024
8 tasks
@tlhunter
Copy link
Member Author

It looks like the jsonpath module might not be very friendly with ESBuild:


  esbuild
cd /home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild
npm run build
Warning: G] "../include/module.js" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/grammar.js:102:58:
      102 │ ...clude = fs.readFileSync(require.resolve("../include/module.js"));
          ╵                                            ~~~~~~~~~~~~~~~~~~~~~~

Warning: G] "../include/action.js" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/grammar.js:103:58:
      103 │ ...clude = fs.readFileSync(require.resolve("../include/action.js"));
          ╵                                            ~~~~~~~~~~~~~~~~~~~~~~

Warning: G] "esprima" should be marked as external for use with "require.resolve" [require-resolve-not-external]

    ../../node_modules/jsonpath/lib/aesprim.js:4:27:
      4 │ var file = require.resolve('esprima');
        ╵                            ~~~~~~~~~

npm run built
node:internal/modules/cjs/loader:1189
  throw err;
  ^

Error: Cannot find module '../include/module.js'
Require stack:
- /home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:1186:15)
    at Function.resolve (node:internal/modules/helpers:133:19)
    at ../../node_modules/jsonpath/lib/grammar.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:34936:55)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/lib/parser.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:35589:19)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/lib/index.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:42469:18)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51)
    at ../../node_modules/jsonpath/index.js (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:42655:23)
    at __require (/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js:12:51) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/home/runner/work/dd-trace-js/dd-trace-js/integration-tests/esbuild/out.js'
  ]
}

@tlhunter
Copy link
Member Author

tlhunter commented May 31, 2024

The forked library jsonpath-plus has about 50% more downloads than jsonpath and fixes the issue so I'll switch to that:
https://www.npmjs.com/package/jsonpath-plus

@pr-commenter
Copy link

pr-commenter bot commented May 31, 2024

Benchmarks

Benchmark execution time: 2024-08-30 22:02:55

Comparing candidate commit ea29c12 in PR branch tlhunter/aws-payload-tagging with baseline commit 34a651e in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 258 metrics, 8 unstable metrics.

@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch 2 times, most recently from c48d29a to aa6cfc5 Compare June 4, 2024 16:29
"request": [
"$.Attributes.KmsMasterKeyId",
"$.Attributes.PlatformCredential",
"$.Attributes.PlatformPrincipal",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Is this global or actually for sns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..Attributes.KmsMasterKeyId

TopicArn,
Message: 'message 1',
MessageAttributes: {
unredacted: { DataType: 'String', StringValue: '{"foo": "bar", "baz": "yup"}' },
Copy link

@ygree ygree Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string that contains a JSON is not supported according to DataDog/datadog-lambda-js#269

AWS support has told us that stuffing JSON into a string is not supported

"$.Subscriptions.*.Endpoint"
],
"expand": [
"$.MessageAttributes.*.StringValue"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$.MessageBody should also be in this list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SNS only:
$.MessageAttributes._datadog.StringValue = base64
$.MessageAttributes.*.StringValue = json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just check DataType. SNS can have a DataType=binary, others will just have strings. And _datadog will be JSON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does expansion rules apply to both request and response?

@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch 12 times, most recently from 4330979 to bbdd8a0 Compare July 10, 2024 17:42
@tlhunter tlhunter enabled auto-merge (squash) August 29, 2024 20:41
@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch 2 times, most recently from 4b8e008 to 1919612 Compare August 29, 2024 21:05
@tlhunter tlhunter force-pushed the tlhunter/aws-payload-tagging branch from 1919612 to ea29c12 Compare August 30, 2024 21:52
@tlhunter tlhunter merged commit f229d64 into master Sep 3, 2024
168 checks passed
@tlhunter tlhunter deleted the tlhunter/aws-payload-tagging branch September 3, 2024 20:45
watson added a commit that referenced this pull request Sep 9, 2024
These snug in via PR #4309 last week
watson added a commit that referenced this pull request Sep 9, 2024
These snug in via PR #4309 last week
watson added a commit that referenced this pull request Sep 10, 2024
juan-fernandez pushed a commit that referenced this pull request Sep 30, 2024
Co-authored-by: Jordi Bertran de Balanda <jordi.bertran@datadoghq.com>
juan-fernandez pushed a commit that referenced this pull request Sep 30, 2024
@juan-fernandez juan-fernandez mentioned this pull request Sep 30, 2024
juan-fernandez pushed a commit that referenced this pull request Sep 30, 2024
Co-authored-by: Jordi Bertran de Balanda <jordi.bertran@datadoghq.com>
juan-fernandez pushed a commit that referenced this pull request Sep 30, 2024
@juan-fernandez juan-fernandez mentioned this pull request Sep 30, 2024
juan-fernandez pushed a commit that referenced this pull request Oct 1, 2024
Co-authored-by: Jordi Bertran de Balanda <jordi.bertran@datadoghq.com>
juan-fernandez pushed a commit that referenced this pull request Oct 1, 2024
juan-fernandez pushed a commit that referenced this pull request Oct 1, 2024
Co-authored-by: Jordi Bertran de Balanda <jordi.bertran@datadoghq.com>
juan-fernandez pushed a commit that referenced this pull request Oct 1, 2024
bouwkast added a commit to DataDog/dd-trace-py that referenced this pull request Nov 12, 2024
)

## Overview

This pull request adds the ability to expand AWS request/response
payloads as span tags.
This matches our lambda offerings and provides useful information to
developers when debugging communication between various AWS services.
This is based on the AWS Payload Tagging RFC and this implementation in
[dd-trace-node](DataDog/dd-trace-js#4309) and
this implementation in
[dd-trace-java](DataDog/dd-trace-java#7312).

This feature is _disabled_ by default.

When activated this will produce span tags such as:

```
 "aws.request.body.PublishBatchRequestEntries.0.Id": "1",
 "aws.request.body.PublishBatchRequestEntries.0.Message": "ironmaiden",
 "aws.request.body.PublishBatchRequestEntries.1.Id": "2",
 "aws.request.body.PublishBatchRequestEntries.1.Message": "megadeth"
 "aws.response.body.HTTPStatusCode": "200",
```

## Configuration

There are five new configuration options:

- `DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING`:
- `""` by default to indicate that AWS request payload expansion is
**disabled** for _requests_.
- `"all"` to define that AWS request payload expansion is **enabled**
for _requests_ using the default `JSONPath`s for redaction logic.
- a comma-separated list of user-supplied `JSONPath`s to define that AWS
request payload expansion is **enabled** for _requests_ using the
default `JSONPath`s and the user-supplied `JSONPath`s for redaction
logic.
- `DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING`:
- `""` by default to indicate that AWS response payload expansion is
**disabled** for _responses_.
- `"all"` to define that AWS response payload expansion is **enabled**
for _responses_ using the default `JSONPath`s for redaction logic.
- a comma-separated list of user-supplied `JSONPath`s to define that AWS
request payload expansion is **enabled** for _responses_ using the
default `JSONPath`s and the user-supplied `JSONPath`s for redaction
logic.
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_DEPTH` (not defined in RFC but
done to match NodeJS):
  - sets the depth after which we stop creating tags from a payload
  - defaults to a value of `10`
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_TAGS` (to match Java
implementation)
  - sets the maximum number of tags allowed to be expanded
  - defaults to a value of `758`
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_SERVICES` (to match Java
implementation)
  - a comma-separated list of supported AWS services
  - defaults to ` s3,sns,sqs,kinesis,eventbridge`

## Other

- [`jsonpath-ng` has been
vendored](https://github.com/h2non/jsonpath-ng/blob/master/jsonpath_ng/jsonpath.py)
- [`ply` has been vendored (v3.11) (dependency of
`jsonpath-ng`)](https://github.com/dabeaz/ply/releases/tag/3.11)

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: erikayasuda <153395705+erikayasuda@users.noreply.github.com>
quinna-h pushed a commit to DataDog/dd-trace-py that referenced this pull request Nov 13, 2024
)

## Overview

This pull request adds the ability to expand AWS request/response
payloads as span tags.
This matches our lambda offerings and provides useful information to
developers when debugging communication between various AWS services.
This is based on the AWS Payload Tagging RFC and this implementation in
[dd-trace-node](DataDog/dd-trace-js#4309) and
this implementation in
[dd-trace-java](DataDog/dd-trace-java#7312).

This feature is _disabled_ by default.

When activated this will produce span tags such as:

```
 "aws.request.body.PublishBatchRequestEntries.0.Id": "1",
 "aws.request.body.PublishBatchRequestEntries.0.Message": "ironmaiden",
 "aws.request.body.PublishBatchRequestEntries.1.Id": "2",
 "aws.request.body.PublishBatchRequestEntries.1.Message": "megadeth"
 "aws.response.body.HTTPStatusCode": "200",
```

## Configuration

There are five new configuration options:

- `DD_TRACE_CLOUD_REQUEST_PAYLOAD_TAGGING`:
- `""` by default to indicate that AWS request payload expansion is
**disabled** for _requests_.
- `"all"` to define that AWS request payload expansion is **enabled**
for _requests_ using the default `JSONPath`s for redaction logic.
- a comma-separated list of user-supplied `JSONPath`s to define that AWS
request payload expansion is **enabled** for _requests_ using the
default `JSONPath`s and the user-supplied `JSONPath`s for redaction
logic.
- `DD_TRACE_CLOUD_RESPONSE_PAYLOAD_TAGGING`:
- `""` by default to indicate that AWS response payload expansion is
**disabled** for _responses_.
- `"all"` to define that AWS response payload expansion is **enabled**
for _responses_ using the default `JSONPath`s for redaction logic.
- a comma-separated list of user-supplied `JSONPath`s to define that AWS
request payload expansion is **enabled** for _responses_ using the
default `JSONPath`s and the user-supplied `JSONPath`s for redaction
logic.
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_DEPTH` (not defined in RFC but
done to match NodeJS):
  - sets the depth after which we stop creating tags from a payload
  - defaults to a value of `10`
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_MAX_TAGS` (to match Java
implementation)
  - sets the maximum number of tags allowed to be expanded
  - defaults to a value of `758`
- `DD_TRACE_CLOUD_PAYLOAD_TAGGING_SERVICES` (to match Java
implementation)
  - a comma-separated list of supported AWS services
  - defaults to ` s3,sns,sqs,kinesis,eventbridge`

## Other

- [`jsonpath-ng` has been
vendored](https://github.com/h2non/jsonpath-ng/blob/master/jsonpath_ng/jsonpath.py)
- [`ply` has been vendored (v3.11) (dependency of
`jsonpath-ng`)](https://github.com/dabeaz/ply/releases/tag/3.11)

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: erikayasuda <153395705+erikayasuda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants