Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect cloud metadata #826

Merged
merged 40 commits into from
Jun 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
7b42a02
Add cloud metadata skeleton
basepi May 14, 2020
056bf90
Add config value
basepi May 14, 2020
a62b97f
Add aws metadata collection
basepi May 14, 2020
2a18569
Use resp instead of r
basepi Jun 9, 2020
f8f32e1
Add GCP metadata
basepi Jun 9, 2020
ca820d4
Add Azure metadata
basepi Jun 9, 2020
97bf332
Refactor headers
basepi Jun 9, 2020
e101451
Use older metadata service version to guarantee availability
basepi Jun 9, 2020
dba4ad9
Add configuration docs
basepi Jun 9, 2020
f06500f
Use a single GCP metadata call with recursive=true
basepi Jun 10, 2020
4519b8f
Use urllib3 properly
basepi Jun 10, 2020
4c0a2bd
Fix some copy pasta
basepi Jun 10, 2020
4d4f0ea
Normalize variable names
basepi Jun 10, 2020
6960498
Pop zone if empty in azure
basepi Jun 11, 2020
5dab403
Remove provider guessing
basepi Jun 12, 2020
9331d45
Token step doesn't appear necessary for AWS
basepi Jun 12, 2020
7f0044c
Just kidding, token needed for IMDSv2
basepi Jun 12, 2020
dd40738
Actually import cloud
basepi Jun 16, 2020
a961180
Add a socket call to avoid urllib3 logs
basepi Jun 16, 2020
34d9ba2
Add tests
basepi Jun 16, 2020
c68b8d5
Merge remote-tracking branch 'upstream/master' into cloudmetadata
basepi Jun 16, 2020
dc96df8
Add changelog
basepi Jun 16, 2020
073562c
Don't bother with tcp handshake for google metadata
basepi Jun 17, 2020
a49b149
Add logs for missing metadata when provider is defined
basepi Jun 17, 2020
60b6f96
Move metadata generation to post-thread-start
basepi Jun 17, 2020
22ba112
Merge remote-tracking branch 'upstream/master' into cloudmetadata
basepi Jun 17, 2020
1557c91
Update tests to allow metadata to be generated in transport
basepi Jun 17, 2020
e126f44
Merge remote-tracking branch 'upstream/master' into cloudmetadata
basepi Jun 17, 2020
4116213
For GCP IDs to string
basepi Jun 17, 2020
489c5ec
Consolidate ret
basepi Jun 17, 2020
7d2765f
We don't need warning counts, just check for the warning we care about
basepi Jun 18, 2020
ef87c56
Merge branch 'master' into cloudmetadata
basepi Jun 24, 2020
13d4680
Merge branch 'master' into cloudmetadata
basepi Jun 24, 2020
0f9a048
Fix header encodings test timing
basepi Jun 24, 2020
cff2225
Merge branch 'master' into cloudmetadata
basepi Jun 24, 2020
d4e8d19
Turn off cloud metadata for tests
basepi Jun 26, 2020
3f0ec61
Merge remote-tracking branch 'upstream/master' into cloudmetadata
basepi Jun 26, 2020
1bbb274
Typo
basepi Jun 26, 2020
8d3739b
We do need a short sleep here
basepi Jun 26, 2020
f131101
Remove unused import
basepi Jun 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ endif::[]
===== Features

* Added graphql (graphene) support {pull}850[#850]
* Collect cloud provider metadata {pull}826[#826]

[float]
===== Bug fixes
Expand Down
21 changes: 19 additions & 2 deletions docs/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ The URL must be fully qualified, including protocol (`http` or `https`) and port

[options="header"]
|============
| Environment | Django/Flask | Default
| Environment | Django/Flask | Default
| `ELASTIC_APM_ENABLED` | `ENABLED` | `true`
|============

Expand All @@ -134,7 +134,7 @@ When set to false, the agent will not collect any data, nor start any background

[options="header"]
|============
| Environment | Django/Flask | Default
| Environment | Django/Flask | Default
| `ELASTIC_APM_RECORDING` | `RECORDING` | `false`
|============

Expand Down Expand Up @@ -199,6 +199,23 @@ See {apm-app-ref}/filters.html#environment-selector[environment selector] in the
NOTE: This feature is fully supported in the APM app in Kibana versions >= 7.2.
You must use the query bar to filter for a specific environment in versions prior to 7.2.

[float]
[[config-cloud-provider]]
==== `cloud_provider`

[options="header"]
|============
| Environment | Django/Flask | Default | Example
| `ELASTIC_APM_CLOUD_PROVIDER` | `CLOUD_PROVIDER` | `None` | `"aws"`
|============

This config value allows you to specify which cloud provider should be assumed
for metadata collection. By default, the agent will attempt to detect the cloud
provider or, if that fails, will use trial and error to collect the metadata.

Valid options are `"aws"`, `"gcp"`, and `"azure"`. If this config value is set
to `False`, then no cloud metadata will be collected.

[float]
[[config-secret-token]]
==== `secret_token`
Expand Down
41 changes: 40 additions & 1 deletion elasticapm/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
from elasticapm.conf.constants import ERROR
from elasticapm.metrics.base_metrics import MetricsRegistry
from elasticapm.traces import Tracer, execution_context
from elasticapm.utils import cgroup, compat, is_master_process, stacks, varmap
from elasticapm.utils import cgroup, cloud, compat, is_master_process, stacks, varmap
from elasticapm.utils.encoding import enforce_label_format, keyword_field, shorten, transform
from elasticapm.utils.logging import get_logger
from elasticapm.utils.module_import import import_string
Expand Down Expand Up @@ -360,12 +360,51 @@ def get_system_info(self):
system_data["kubernetes"] = k8s
return system_data

def get_cloud_info(self):
"""
Detects if the app is running in a cloud provider and fetches relevant
metadata from the cloud provider's metadata endpoint.
"""
provider = self.config.cloud_provider

if provider is False:
return {}
if provider == "aws":
data = cloud.aws_metadata()
if not data:
self.logger.warning("Cloud provider {0} defined, but no metadata was found.".format(provider))
return data
elif provider == "gcp":
data = cloud.gcp_metadata()
if not data:
self.logger.warning("Cloud provider {0} defined, but no metadata was found.".format(provider))
return data
elif provider == "azure":
data = cloud.azure_metadata()
if not data:
self.logger.warning("Cloud provider {0} defined, but no metadata was found.".format(provider))
return data
else:
# Trial and error
data = {}
data = cloud.aws_metadata()
if data:
return data
data = cloud.gcp_metadata()
if data:
return data
data = cloud.azure_metadata()
return data

def build_metadata(self):
data = {
"service": self.get_service_info(),
"process": self.get_process_info(),
"system": self.get_system_info(),
"cloud": self.get_cloud_info(),
basepi marked this conversation as resolved.
Show resolved Hide resolved
}
if not data["cloud"]:
data.pop("cloud")
if self.config.global_labels:
data["labels"] = enforce_label_format(self.config.global_labels)
return data
Expand Down
1 change: 1 addition & 0 deletions elasticapm/conf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ class Config(_ConfigBase):
django_transaction_name_from_route = _BoolConfigValue("DJANGO_TRANSACTION_NAME_FROM_ROUTE", default=False)
disable_log_record_factory = _BoolConfigValue("DISABLE_LOG_RECORD_FACTORY", default=False)
use_elastic_traceparent_header = _BoolConfigValue("USE_ELASTIC_TRACEPARENT_HEADER", default=True)
cloud_provider = _ConfigValue("CLOUD_PROVIDER", default=None)

@property
def is_recording(self):
Expand Down
7 changes: 4 additions & 3 deletions elasticapm/transport/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ def queue(self, event_type, data, flush=False):
logger.debug("Event of type %s dropped due to full event queue", event_type)

def _process_queue(self):
# Rebuild the metadata to capture new process information
if self.client:
self._metadata = self.client.build_metadata()

buffer = self._init_buffer()
buffer_written = False
# add some randomness to timeout to avoid stampedes of several workers that are booted at the same time
Expand Down Expand Up @@ -229,9 +233,6 @@ def _flush(self, buffer):
def start_thread(self, pid=None):
super(Transport, self).start_thread(pid=pid)
if (not self._thread or self.pid != self._thread.pid) and not self._closed:
# Rebuild the metadata to capture new process information
if self.client:
self._metadata = self.client.build_metadata()
try:
self._thread = threading.Thread(target=self._process_queue, name="eapm event processor thread")
self._thread.daemon = True
Expand Down
159 changes: 159 additions & 0 deletions elasticapm/utils/cloud.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# BSD 3-Clause License
#
# Copyright (c) 2019, Elasticsearch BV
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import json
import os
import socket

import urllib3


def aws_metadata():
"""
Fetch AWS metadata from the local metadata server. If metadata server is
not found, return an empty dictionary
"""
http = urllib3.PoolManager()

try:
# This will throw an error if the metadata server isn't available,
# and will be quiet in the logs, unlike urllib3
socket.create_connection(("169.254.169.254", 80), 0.1)

ttl_header = {"X-aws-ec2-metadata-token-ttl-seconds": "300"}
token_url = "http://169.254.169.254/latest/api/token"
token_request = http.request("PUT", token_url, headers=ttl_header, timeout=3.0)
token = token_request.data.decode("utf-8")
aws_token_header = {"X-aws-ec2-metadata-token": token} if token else {}
basepi marked this conversation as resolved.
Show resolved Hide resolved
metadata = json.loads(
http.request(
"GET",
"http://169.254.169.254/latest/dynamic/instance-identity/document",
headers=aws_token_header,
timeout=3.0,
retries=False,
).data.decode("utf-8")
)

return {
"account": {"id": metadata["accountId"]},
"instance": {"id": metadata["instanceId"]},
"availability_zone": metadata["availabilityZone"],
"machine": {"type": metadata["instanceType"]},
"provider": "aws",
"region": metadata["region"],
}

except Exception:
# Not on an AWS box
return {}


def gcp_metadata():
"""
Fetch GCP metadata from the local metadata server. If metadata server is
not found, return an empty dictionary
"""
headers = {"Metadata-Flavor": "Google"}
http = urllib3.PoolManager()

try:
# This will throw an error if the metadata server isn't available,
# and will be quiet in the logs, unlike urllib3
socket.getaddrinfo("metadata.google.internal", 80, 0, socket.SOCK_STREAM)

metadata = json.loads(
http.request(
"GET",
"http://metadata.google.internal/computeMetadata/v1/?recursive=true",
headers=headers,
timeout=3.0,
retries=False,
).data.decode("utf-8")
)

availability_zone = os.path.split(metadata["instance"]["zone"])[1]

return {
"provider": "gcp",
"instance": {"id": str(metadata["instance"]["id"]), "name": metadata["instance"]["name"]},
"project": {"id": str(metadata["project"]["numericProjectId"]), "name": metadata["project"]["projectId"]},
"availability_zone": availability_zone,
"region": availability_zone.rsplit("-", 1)[0],
"machine": {"type": metadata["instance"]["machineType"]},
}

except Exception:
# Not on a gcp box
return {}


def azure_metadata():
"""
Fetch Azure metadata from the local metadata server. If metadata server is
not found, return an empty dictionary
"""
headers = {"Metadata": "true"}
http = urllib3.PoolManager()

try:
# This will throw an error if the metadata server isn't available,
# and will be quiet in the logs, unlike urllib3
socket.create_connection(("169.254.169.254", 80), 0.1)

# Can't use newest metadata service version, as it's not guaranteed
# to be available in all regions
metadata = json.loads(
http.request(
"GET",
"http://169.254.169.254/metadata/instance/compute?api-version=2019-08-15",
headers=headers,
timeout=3.0,
retries=False,
).data.decode("utf-8")
)

ret = {
"account": {"id": metadata["subscriptionId"]},
"instance": {"id": metadata["vmId"], "name": metadata["name"]},
"project": {"name": metadata["resourceGroupName"]},
"availability_zone": metadata["zone"],
"machine": {"type": metadata["vmSize"]},
"provider": "azure",
"region": metadata["location"],
}

if not ret["availability_zone"]:
ret.pop("availability_zone")
return ret

except Exception:
# Not on an Azure box
return {}
3 changes: 0 additions & 3 deletions tests/client/client_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -790,15 +790,12 @@ def test_python_version_deprecation(mock_python_version_tuple, version, raises,
if e:
e.close()
if raises:
assert len(recwarn) == 1
if pending:
w = recwarn.pop(PendingDeprecationWarning)
assert "will stop supporting" in w.message.args[0]
else:
w = recwarn.pop(DeprecationWarning)
assert "agent only supports" in w.message.args[0]
else:
assert len(recwarn) == 0


def test_recording(elasticapm_client):
Expand Down
1 change: 1 addition & 0 deletions tests/fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ def elasticapm_client(request):
client_config.setdefault("include_paths", ("*/tests/*",))
client_config.setdefault("span_frames_min_duration", -1)
client_config.setdefault("metrics_interval", "0ms")
client_config.setdefault("cloud_provider", False)
client = TempStoreClient(**client_config)
yield client
client.close()
Expand Down
1 change: 1 addition & 0 deletions tests/transports/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,5 +248,6 @@ def test_transport_metadata_pid_change(mock_send, elasticapm_client):
transport = Transport(client=elasticapm_client)
assert not transport._metadata
transport.start_thread()
time.sleep(0.2)
assert transport._metadata
transport.close()
Loading