Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifies Behavior of SANITIZE_FIELD_NAMES #378

Merged
merged 14 commits into from
Oct 13, 2021

Conversation

astorm
Copy link
Contributor

@astorm astorm commented Dec 5, 2020

While we (Node Agent) were working on the SANITIZE_FIELD_NAMES feature we discovered some ambiguities in the spec -- likely due to there not being a Node Agent team when this spec was agreed on :)

This PR attempts to resolve those ambiguities and is intended as the start of a conversation, not the end of one. Our intention is to better understand how all Agents are doing things.

@astorm astorm requested review from a team as code owners December 5, 2020 00:52
@apmmachine
Copy link

apmmachine commented Dec 5, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-10-11T04:29:00.401+0000

  • Duration: 3 min 2 sec

  • Commit: bd32292

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the spec. I don't know how we managed to do anything in your absence!
I hope it is OK if I approve (on behalf of the Java agent) as if it is the end of the discussion 🙂
Only a few minor suggestions/points.

specs/agents/sanitization.md Outdated Show resolved Hide resolved
specs/agents/sanitization.md Outdated Show resolved Hide resolved
specs/agents/sanitization.md Outdated Show resolved Hide resolved
Comment on lines +25 to +33
Agents MUST provide a minimum default configuration of

[ 'password', 'passwd', 'pwd', 'secret', '*key', '*token*', '*session*',
'*credit*','*card*', 'authorization', 'set-cookie']

for the `sanitize_field_names` configuration value. Agent's MAY include the
following extra fields in their default configuration to avoid breaking changes

['pw','pass','connect.sid']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that some agents have different defaults should be reflected in the remote config description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned there's a remote config description :) These extra fields were/are for the Node.js agent specifically -- they're fields that we previously redacted that would be tricky/cumbersome (but not impossible) to change. However, since it sounds like there's downstream dependencies I'm learning towards just removing this and going with the defaults.

@eyalkoren (or anyone) A few follow up questions for you -- is the the remote config description used by the central configuration system?

That is -- with those defaults set in general_settings.ts does that mean the central configuration system will send these values by default? The follow on question that is -- do agents typically need to set these default values themselves, or can/should we rely on them being set by central configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the the remote config description used by the central configuration system?

No, it is only a string for description of the default.

does that mean the central configuration system will send these values by default? The follow on question that is -- do agents typically need to set these default values themselves, or can/should we rely on them being set by central configuration?

The central config does not send these values, agents need to set them themselves. After all, the central config is just one more configuration system for agents, which need to have proper defaults even if it is disabled, and you would typically use the same default regardless of the configuration source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astorm So you leaning towards having the apm-agent-nodejs drop ['pw','pass','connect.sid'] by default? I don't disagree. We'll need to defer doing so until a major ver bump, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eyalkoren -- the description of this issue has been updated with a TO DO that reflects the need to update the remote config description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trentm I suspect we'll talk off of GitHub about this, but my intent was to spec ['pw','pass','connect.sid'] this way in order to give us some flexibility in keeping these extra keys redacted and not force a major version bump on us immediately.

specs/agents/sanitization.md Outdated Show resolved Hide resolved
Copy link
Member

@felixbarny felixbarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the spec!

specs/agents/sanitization.md Outdated Show resolved Hide resolved
specs/agents/sanitization.md Outdated Show resolved Hide resolved

Fields that MUST be sanitized are the HTTP Request headers, HTTP Response
headers, and form fields in an `application/x-www-form-urlencoded` request
body. No fields (including `set-cookie` headers) are exempt from this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also transaction.context.request.cookies which is a map of cookie names to values. This map must be redacted, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only transaction.context.request.cookies map should be filtered or those Set-Cookie HTTP header(s) that reference excluded cookie names as well (assuming that the configured value is different from the default and Set-Cookie is not included)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set-Cookie is a response header. The corresponding request header is Cookie. As the content of the Cookie header is the same as transaction.context.request.cookies, in the Java agent, we're removing Cookie from transaction.context.request.headers. This both minimizes the amount of duplicate data and eliminates the complication of having to redact parts of a header value.

For response headers, there's no equivalent to transaction.context.request.cookies, so we just sanitize the transaction.context.response.headers map.

See also https://github.com/elastic/apm-agent-java/blob/master/apm-agent-core/src/main/java/co/elastic/apm/agent/impl/context/SanitizingWebProcessor.java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the Java agent, we're removing Cookie from transaction.context.request.headers.

Should we include it in the spec?

Copy link
Contributor Author

@astorm astorm Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixelastic @SergeyKleyman @basepi Thank you all. If I'm reading the above correctly, it sounds like there's three data structures/values to consider here

  1. The Set-Cookie response header
  2. The Cookie request header
  3. A representation of the key/values pairs of a Set-Cookie header that's stored in transaction.context.request.cookies

If I spec-ed this as something like

MUST redact transaction.context.request.cookies

If agents capture values for Set-Cookie or Cookie request/response headers, it MUST redact the field values here as well.

would that capture what we're after? Or is there more nuance here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKleyman Re:

can this behavior be achieved by adding cookie to SANITIZE_FIELD_NAMES default value?

Good question! Thinking that through -- I'm not sure this would achieve what we want. Here's my reasoning: If we added cookie to SANITIZE_FIELD_NAMES, we'd be instructing the agent to REDACT the a header named cookie, and not to remove it completely.

Also, I've been operating under the assumption that the default fields were decided on a while ago and would be a pain to change now after some agent teams have already landed this functionality or are midstream on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny Thank you for sharing those data structures -- I learned something. It looks like the Node.js Agent doesn't capture transaction.context.request.cookies at all. Wondering if we need a spec for what part of the request gets captured or if that's just going spec wild.

More pertinent to our discussion though -- What do you think about this as final language for future proofing things?

Fields that MUST be sanitized are the HTTP Request headers, HTTP Response
headers, and form fields in an application/x-www-form-urlencoded request
body. Additionally, if cookie headers are parsed into name/value pairs and reported
to APM Server via the agent (for example, transaction.context.request.cookies), the
cookie names must be sanitized as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know some agents don't send request.cookies.
That property is spec'd in the intake API but it's not a required field.

I like that wording!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my reasoning: If we added cookie to SANITIZE_FIELD_NAMES, we'd be instructing the agent to REDACT the a header named cookie, and not to remove it completely.

The main reason for my suggestion is that it might look strange to the users that Set-Cookie header is redacted while Cookie header is removed completely.

Copy link
Contributor

@SergeyKleyman SergeyKleyman Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if cookie headers are parsed into name/value pairs and reported
to APM Server via the agent (for example, transaction.context.request.cookies), the
cookie names must be sanitized as well.

If we decide to go with hardcoding cookie header removal it might be worth adding something like:

In addition parsed cookie headers MUST NOT be included in the set of headers (in the example above in transaction.context.request.headers)

specs/agents/sanitization.md Outdated Show resolved Hide resolved
basepi added a commit to basepi/apm-agent-python that referenced this pull request Dec 8, 2020
specs/agents/sanitization.md Outdated Show resolved Hide resolved
astorm and others added 3 commits December 15, 2020 16:54
Co-authored-by: eyalkoren <41850454+eyalkoren@users.noreply.github.com>
Co-authored-by: eyalkoren <41850454+eyalkoren@users.noreply.github.com>
basepi added a commit to elastic/apm-agent-python that referenced this pull request Jan 28, 2021
* Align sanitize_field_names with cross-agent spec

* Add CHANGELOG

* Remove value sanitization

The value of the credit card check was dubious, and this change will
improve performance.

* Move to MASK recommended by elastic/apm#378

* Update changelog

* Remove querystring sanitization

* Update pre-commit to match upstream black behavior

* Dynamically generate BASE_SANITIZE_FIELD_NAMES

* Fix the merge commit
beniwohli pushed a commit to beniwohli/apm-agent-python that referenced this pull request Sep 14, 2021
)

* Align sanitize_field_names with cross-agent spec

* Add CHANGELOG

* Remove value sanitization

The value of the credit card check was dubious, and this change will
improve performance.

* Move to MASK recommended by elastic/apm#378

* Update changelog

* Remove querystring sanitization

* Update pre-commit to match upstream black behavior

* Dynamically generate BASE_SANITIZE_FIELD_NAMES

* Fix the merge commit
@astorm astorm merged commit 7b4962a into elastic:master Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants