Skip to content

Commit

Permalink
[META 319] Align sanitize_field_names with cross-agent spec (elastic#982
Browse files Browse the repository at this point in the history
)

* Align sanitize_field_names with cross-agent spec

* Add CHANGELOG

* Remove value sanitization

The value of the credit card check was dubious, and this change will
improve performance.

* Move to MASK recommended by elastic/apm#378

* Update changelog

* Remove querystring sanitization

* Update pre-commit to match upstream black behavior

* Dynamically generate BASE_SANITIZE_FIELD_NAMES

* Fix the merge commit
  • Loading branch information
basepi authored and beniwohli committed Sep 14, 2021
1 parent b100e8a commit 3f1317c
Show file tree
Hide file tree
Showing 7 changed files with 136 additions and 169 deletions.
11 changes: 9 additions & 2 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ endif::[]
//===== Bug fixes
//
[float]
=== Unreleased
// Unreleased changes go here
Expand All @@ -39,7 +40,14 @@ endif::[]
* Python 2.7 and 3.5 support has been deprecated. The Python agent now requires Python 3.6+ {pull}1021[#1021]
* No longer collecting body for `elasticsearch-py` `update` and `delete_by_query` {pull}1013[#1013]
* Align `sanitize_field_names` config with the
https://github.com/elastic/apm/blob/3fa78e2a1eeea81c73c2e16e96dbf6b2e79f3c64/specs/agents/sanitization.md[cross-agent spec].
If you are using a non-default `sanitize_field_names`, surrounding each of your entries with stars (i.e.
`\*secret\*`) will retain the old behavior. {pull}982[#982]
* Remove credit card sanitization for field values. This improves performance, and the security value of this check was
dubious anyway. {pull}982[#982]
* Remove HTTP querystring sanitization. This improves performance, and is meant to standardize behavior across the
agents, as defined in https://github.com/elastic/apm/pull/334. {pull}982[#982]
[float]
===== Features
Expand All @@ -50,7 +58,6 @@ endif::[]
* Fix for GraphQL span spamming from scalar fields with required flag {pull}1015[#1015]
[[release-notes-5.x]]
=== Python Agent version 5.x
Expand Down
31 changes: 18 additions & 13 deletions docs/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,6 @@ to avoid stampedes of instances that start at the same time.
'elasticapm.processors.sanitize_http_request_cookies',
'elasticapm.processors.sanitize_http_headers',
'elasticapm.processors.sanitize_http_wsgi_env',
'elasticapm.processors.sanitize_http_request_querystring',
'elasticapm.processors.sanitize_http_request_body']`
|============

Expand All @@ -707,23 +706,29 @@ WARNING: We recommend always including the default set of validators if you cust
[[config-sanitize-field-names]]
==== `sanitize_field_names`

<<dynamic-configuration, image:./images/dynamic-config.svg[] >>

[options="header"]
|============
| Environment | Django/Flask | Default
| `ELASTIC_APM_SANITIZE_FIELD_NAMES` | `SANITIZE_FIELD_NAMES` | `['authorization',
'password',
'secret',
'passwd',
'token',
'api_key',
'access_token',
'sessionid']`
|============

A list of field names to mask when using processors.
| `ELASTIC_APM_SANITIZE_FIELD_NAMES` | `SANITIZE_FIELD_NAMES` | `["password",
"passwd",
"pwd",
"secret",
"*key",
"*token*",
"*session*",
"*credit*",
"*card*",
"authorization",
"set-cookie"]`
|============

A list of glob-matched field names to match and mask when using processors.
For more information, see <<sanitizing-data, Sanitizing Data>>.

WARNING: We recommend always including the default set of field names if you customize this setting.
WARNING: We recommend always including the default set of field name matches
if you customize this setting.


[float]
Expand Down
1 change: 0 additions & 1 deletion docs/sanitizing-data.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ ELASTIC_APM = {
'elasticapm.processors.sanitize_http_request_cookies',
'elasticapm.processors.sanitize_http_headers',
'elasticapm.processors.sanitize_http_wsgi_env',
'elasticapm.processors.sanitize_http_request_querystring',
'elasticapm.processors.sanitize_http_request_body',
),
}
Expand Down
5 changes: 3 additions & 2 deletions elasticapm/conf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,11 +503,12 @@ class Config(_ConfigBase):
"elasticapm.processors.sanitize_http_response_cookies",
"elasticapm.processors.sanitize_http_headers",
"elasticapm.processors.sanitize_http_wsgi_env",
"elasticapm.processors.sanitize_http_request_querystring",
"elasticapm.processors.sanitize_http_request_body",
],
)
sanitize_field_names = _ListConfigValue("SANITIZE_FIELD_NAMES", default=BASE_SANITIZE_FIELD_NAMES)
sanitize_field_names = _ListConfigValue(
"SANITIZE_FIELD_NAMES", type=starmatch_to_regex, default=BASE_SANITIZE_FIELD_NAMES
)
metrics_sets = _ListConfigValue(
"METRICS_SETS",
default=[
Expand Down
46 changes: 38 additions & 8 deletions elasticapm/conf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,31 @@
import re
from collections import namedtuple


def _starmatch_to_regex(pattern):
"""
This is a duplicate of starmatch_to_regex() in utils/__init__.py
Duplication to avoid circular imports
"""
options = re.DOTALL
# check if we are case sensitive
if pattern.startswith("(?-i)"):
pattern = pattern[5:]
else:
options |= re.IGNORECASE
i, n = 0, len(pattern)
res = []
while i < n:
c = pattern[i]
i = i + 1
if c == "*":
res.append(".*")
else:
res.append(re.escape(c))
return re.compile(r"(?:%s)\Z" % "".join(res), options)


EVENTS_API_PATH = "intake/v2/events"
AGENT_CONFIG_PATH = "config/v1/agents"

Expand All @@ -46,7 +71,7 @@

HTTP_WITH_BODY = {"POST", "PUT", "PATCH", "DELETE"}

MASK = 8 * "*"
MASK = "[REDACTED]"

EXCEPTION_CHAIN_MAX_DEPTH = 50

Expand All @@ -59,17 +84,22 @@

HARDCODED_PROCESSORS = ["elasticapm.processors.add_context_lines_to_frames"]

BASE_SANITIZE_FIELD_NAMES = [
"authorization",
BASE_SANITIZE_FIELD_NAMES_UNPROCESSED = [
"password",
"secret",
"passwd",
"token",
"api_key",
"access_token",
"sessionid",
"pwd",
"secret",
"*key",
"*token*",
"*session*",
"*credit*",
"*card*",
"authorization",
"set-cookie",
]

BASE_SANITIZE_FIELD_NAMES = [_starmatch_to_regex(x) for x in BASE_SANITIZE_FIELD_NAMES_UNPROCESSED]

OUTCOME = namedtuple("OUTCOME", ["SUCCESS", "FAILURE", "UNKNOWN"])(
SUCCESS="success", FAILURE="failure", UNKNOWN="unknown"
)
Expand Down
44 changes: 6 additions & 38 deletions elasticapm/processors.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,14 @@
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE


import re
import warnings
from collections import defaultdict

from elasticapm.conf.constants import BASE_SANITIZE_FIELD_NAMES, ERROR, MASK, SPAN, TRANSACTION
from elasticapm.utils import compat, varmap
from elasticapm.utils.encoding import force_text, keyword_field
from elasticapm.utils.encoding import force_text
from elasticapm.utils.stacks import get_lines_from_file

SANITIZE_VALUE_PATTERNS = [re.compile(r"^[- \d]{16,19}$")] # credit card numbers, with or without spacers


def for_events(*events):
"""
Expand Down Expand Up @@ -116,7 +113,7 @@ def sanitize_http_request_cookies(client, event):

# sanitize request.header.cookie string
try:
cookie_string = event["context"]["request"]["headers"]["cookie"]
cookie_string = force_text(event["context"]["request"]["headers"]["cookie"], errors="replace")
event["context"]["request"]["headers"]["cookie"] = _sanitize_string(
cookie_string, "; ", "=", sanitize_field_names=client.config.sanitize_field_names
)
Expand All @@ -134,7 +131,7 @@ def sanitize_http_response_cookies(client, event):
:return: The modified event
"""
try:
cookie_string = event["context"]["response"]["headers"]["set-cookie"]
cookie_string = force_text(event["context"]["response"]["headers"]["set-cookie"], errors="replace")
event["context"]["response"]["headers"]["set-cookie"] = _sanitize_string(
cookie_string, ";", "=", sanitize_field_names=client.config.sanitize_field_names
)
Expand Down Expand Up @@ -190,32 +187,6 @@ def sanitize_http_wsgi_env(client, event):
return event


@for_events(ERROR, TRANSACTION)
def sanitize_http_request_querystring(client, event):
"""
Sanitizes http request query string
:param client: an ElasticAPM client
:param event: a transaction or error event
:return: The modified event
"""
try:
query_string = force_text(event["context"]["request"]["url"]["search"], errors="replace")
except (KeyError, TypeError):
return event
if "=" in query_string:
sanitized_query_string = _sanitize_string(
query_string, "&", "=", sanitize_field_names=client.config.sanitize_field_names
)
full_url = event["context"]["request"]["url"]["full"]
# we need to pipe the sanitized string through encoding.keyword_field to ensure that the maximum
# length of keyword fields is still ensured.
event["context"]["request"]["url"]["search"] = keyword_field(sanitized_query_string)
event["context"]["request"]["url"]["full"] = keyword_field(
full_url.replace(query_string, sanitized_query_string)
)
return event


@for_events(ERROR, TRANSACTION)
def sanitize_http_request_body(client, event):
"""
Expand Down Expand Up @@ -276,16 +247,13 @@ def mark_in_app_frames(client, event):

def _sanitize(key, value, **kwargs):
if "sanitize_field_names" in kwargs:
sanitize_field_names = frozenset(kwargs["sanitize_field_names"])
sanitize_field_names = kwargs["sanitize_field_names"]
else:
sanitize_field_names = frozenset(BASE_SANITIZE_FIELD_NAMES)
sanitize_field_names = BASE_SANITIZE_FIELD_NAMES

if value is None:
return

if isinstance(value, compat.string_types) and any(pattern.match(value) for pattern in SANITIZE_VALUE_PATTERNS):
return MASK

if isinstance(value, dict):
# varmap will call _sanitize on each k:v pair of the dict, so we don't
# have to do anything with dicts here
Expand All @@ -296,7 +264,7 @@ def _sanitize(key, value, **kwargs):

key = key.lower()
for field in sanitize_field_names:
if field in key:
if field.match(key.strip()):
# store mask as a fixed length for security
return MASK
return value
Expand Down
Loading

0 comments on commit 3f1317c

Please sign in to comment.