Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META 319] Align sanitize_field_names with cross-agent spec #982

Merged
merged 10 commits into from
Jan 28, 2021
11 changes: 9 additions & 2 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ endif::[]
//===== Bug fixes
//

[float]
=== Unreleased

// Unreleased changes go here
Expand All @@ -39,7 +40,14 @@ endif::[]

* Python 2.7 and 3.5 support has been deprecated. The Python agent now requires Python 3.6+ {pull}1021[#1021]
* No longer collecting body for `elasticsearch-py` `update` and `delete_by_query` {pull}1013[#1013]

* Align `sanitize_field_names` config with the
https://github.com/elastic/apm/blob/3fa78e2a1eeea81c73c2e16e96dbf6b2e79f3c64/specs/agents/sanitization.md[cross-agent spec].
If you are using a non-default `sanitize_field_names`, surrounding each of your entries with stars (i.e.
`\*secret\*`) will retain the old behavior. {pull}982[#982]
* Remove credit card sanitization for field values. This improves performance, and the security value of this check was
dubious anyway. {pull}982[#982]
* Remove HTTP querystring sanitization. This improves performance, and is meant to standardize behavior across the
agents, as defined in https://github.com/elastic/apm/pull/334. {pull}982[#982]

[float]
===== Features
Expand All @@ -50,7 +58,6 @@ endif::[]

* Fix for GraphQL span spamming from scalar fields with required flag {pull}1015[#1015]


[[release-notes-5.x]]
=== Python Agent version 5.x

Expand Down
31 changes: 18 additions & 13 deletions docs/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,6 @@ to avoid stampedes of instances that start at the same time.
'elasticapm.processors.sanitize_http_request_cookies',
'elasticapm.processors.sanitize_http_headers',
'elasticapm.processors.sanitize_http_wsgi_env',
'elasticapm.processors.sanitize_http_request_querystring',
'elasticapm.processors.sanitize_http_request_body']`
|============

Expand All @@ -707,23 +706,29 @@ WARNING: We recommend always including the default set of validators if you cust
[[config-sanitize-field-names]]
==== `sanitize_field_names`

<<dynamic-configuration, image:./images/dynamic-config.svg[] >>

[options="header"]
|============
| Environment | Django/Flask | Default
| `ELASTIC_APM_SANITIZE_FIELD_NAMES` | `SANITIZE_FIELD_NAMES` | `['authorization',
'password',
'secret',
'passwd',
'token',
'api_key',
'access_token',
'sessionid']`
|============

A list of field names to mask when using processors.
| `ELASTIC_APM_SANITIZE_FIELD_NAMES` | `SANITIZE_FIELD_NAMES` | `["password",
"passwd",
"pwd",
"secret",
"*key",
"*token*",
"*session*",
"*credit*",
"*card*",
"authorization",
"set-cookie"]`
|============

A list of glob-matched field names to match and mask when using processors.
For more information, see <<sanitizing-data, Sanitizing Data>>.

WARNING: We recommend always including the default set of field names if you customize this setting.
WARNING: We recommend always including the default set of field name matches
if you customize this setting.


[float]
Expand Down
1 change: 0 additions & 1 deletion docs/sanitizing-data.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ ELASTIC_APM = {
'elasticapm.processors.sanitize_http_request_cookies',
'elasticapm.processors.sanitize_http_headers',
'elasticapm.processors.sanitize_http_wsgi_env',
'elasticapm.processors.sanitize_http_request_querystring',
'elasticapm.processors.sanitize_http_request_body',
),
}
Expand Down
5 changes: 3 additions & 2 deletions elasticapm/conf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,11 +503,12 @@ class Config(_ConfigBase):
"elasticapm.processors.sanitize_http_response_cookies",
"elasticapm.processors.sanitize_http_headers",
"elasticapm.processors.sanitize_http_wsgi_env",
"elasticapm.processors.sanitize_http_request_querystring",
"elasticapm.processors.sanitize_http_request_body",
],
)
sanitize_field_names = _ListConfigValue("SANITIZE_FIELD_NAMES", default=BASE_SANITIZE_FIELD_NAMES)
sanitize_field_names = _ListConfigValue(
"SANITIZE_FIELD_NAMES", type=starmatch_to_regex, default=BASE_SANITIZE_FIELD_NAMES
)
metrics_sets = _ListConfigValue(
"METRICS_SETS",
default=[
Expand Down
46 changes: 38 additions & 8 deletions elasticapm/conf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,31 @@
import re
from collections import namedtuple


def _starmatch_to_regex(pattern):
"""
This is a duplicate of starmatch_to_regex() in utils/__init__.py

Duplication to avoid circular imports
"""
options = re.DOTALL
# check if we are case sensitive
if pattern.startswith("(?-i)"):
pattern = pattern[5:]
else:
options |= re.IGNORECASE
i, n = 0, len(pattern)
res = []
while i < n:
c = pattern[i]
i = i + 1
if c == "*":
res.append(".*")
else:
res.append(re.escape(c))
return re.compile(r"(?:%s)\Z" % "".join(res), options)


EVENTS_API_PATH = "intake/v2/events"
AGENT_CONFIG_PATH = "config/v1/agents"

Expand All @@ -46,7 +71,7 @@

HTTP_WITH_BODY = {"POST", "PUT", "PATCH", "DELETE"}

MASK = 8 * "*"
MASK = "[REDACTED]"

EXCEPTION_CHAIN_MAX_DEPTH = 50

Expand All @@ -59,17 +84,22 @@

HARDCODED_PROCESSORS = ["elasticapm.processors.add_context_lines_to_frames"]

BASE_SANITIZE_FIELD_NAMES = [
"authorization",
basepi marked this conversation as resolved.
Show resolved Hide resolved
BASE_SANITIZE_FIELD_NAMES_UNPROCESSED = [
"password",
"secret",
"passwd",
"token",
"api_key",
"access_token",
"sessionid",
"pwd",
"secret",
"*key",
"*token*",
"*session*",
"*credit*",
"*card*",
"authorization",
"set-cookie",
]

BASE_SANITIZE_FIELD_NAMES = [_starmatch_to_regex(x) for x in BASE_SANITIZE_FIELD_NAMES_UNPROCESSED]

OUTCOME = namedtuple("OUTCOME", ["SUCCESS", "FAILURE", "UNKNOWN"])(
SUCCESS="success", FAILURE="failure", UNKNOWN="unknown"
)
Expand Down
44 changes: 6 additions & 38 deletions elasticapm/processors.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,14 @@
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE


import re
import warnings
from collections import defaultdict

from elasticapm.conf.constants import BASE_SANITIZE_FIELD_NAMES, ERROR, MASK, SPAN, TRANSACTION
from elasticapm.utils import compat, varmap
from elasticapm.utils.encoding import force_text, keyword_field
from elasticapm.utils.encoding import force_text
from elasticapm.utils.stacks import get_lines_from_file

SANITIZE_VALUE_PATTERNS = [re.compile(r"^[- \d]{16,19}$")] # credit card numbers, with or without spacers


def for_events(*events):
"""
Expand Down Expand Up @@ -116,7 +113,7 @@ def sanitize_http_request_cookies(client, event):

# sanitize request.header.cookie string
try:
cookie_string = event["context"]["request"]["headers"]["cookie"]
cookie_string = force_text(event["context"]["request"]["headers"]["cookie"], errors="replace")
event["context"]["request"]["headers"]["cookie"] = _sanitize_string(
cookie_string, "; ", "=", sanitize_field_names=client.config.sanitize_field_names
)
Expand All @@ -134,7 +131,7 @@ def sanitize_http_response_cookies(client, event):
:return: The modified event
"""
try:
cookie_string = event["context"]["response"]["headers"]["set-cookie"]
cookie_string = force_text(event["context"]["response"]["headers"]["set-cookie"], errors="replace")
event["context"]["response"]["headers"]["set-cookie"] = _sanitize_string(
cookie_string, ";", "=", sanitize_field_names=client.config.sanitize_field_names
)
Expand Down Expand Up @@ -190,32 +187,6 @@ def sanitize_http_wsgi_env(client, event):
return event


@for_events(ERROR, TRANSACTION)
def sanitize_http_request_querystring(client, event):
"""
Sanitizes http request query string
:param client: an ElasticAPM client
:param event: a transaction or error event
:return: The modified event
"""
try:
query_string = force_text(event["context"]["request"]["url"]["search"], errors="replace")
except (KeyError, TypeError):
return event
if "=" in query_string:
sanitized_query_string = _sanitize_string(
query_string, "&", "=", sanitize_field_names=client.config.sanitize_field_names
)
full_url = event["context"]["request"]["url"]["full"]
# we need to pipe the sanitized string through encoding.keyword_field to ensure that the maximum
# length of keyword fields is still ensured.
event["context"]["request"]["url"]["search"] = keyword_field(sanitized_query_string)
event["context"]["request"]["url"]["full"] = keyword_field(
full_url.replace(query_string, sanitized_query_string)
)
return event


@for_events(ERROR, TRANSACTION)
def sanitize_http_request_body(client, event):
"""
Expand Down Expand Up @@ -276,16 +247,13 @@ def mark_in_app_frames(client, event):

def _sanitize(key, value, **kwargs):
if "sanitize_field_names" in kwargs:
sanitize_field_names = frozenset(kwargs["sanitize_field_names"])
sanitize_field_names = kwargs["sanitize_field_names"]
else:
sanitize_field_names = frozenset(BASE_SANITIZE_FIELD_NAMES)
sanitize_field_names = BASE_SANITIZE_FIELD_NAMES

if value is None:
return

if isinstance(value, compat.string_types) and any(pattern.match(value) for pattern in SANITIZE_VALUE_PATTERNS):
return MASK

if isinstance(value, dict):
# varmap will call _sanitize on each k:v pair of the dict, so we don't
# have to do anything with dicts here
Expand All @@ -296,7 +264,7 @@ def _sanitize(key, value, **kwargs):

key = key.lower()
for field in sanitize_field_names:
if field in key:
if field.match(key.strip()):
# store mask as a fixed length for security
return MASK
return value
Expand Down
Loading