Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Monitoring: A minimal viable product #348

Merged
merged 36 commits into from
Jan 11, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
6d5f9e2
[Metrics] A working node exporter done.
simon-mo Dec 17, 2017
439d9f8
[Metric] Model Exporter Done.
simon-mo Dec 19, 2017
3166bc0
[Metric] Add FrontendExporter Docker image
simon-mo Dec 19, 2017
8fbba2f
[Metric] Add docstrings
simon-mo Dec 19, 2017
7947a5f
[Metric] Add integration test for clipper metric
simon-mo Dec 20, 2017
79f3145
[Metric] Format Code
simon-mo Dec 20, 2017
e1c7227
[Metrics] Bug Fix, update the name accoridingly
simon-mo Jan 3, 2018
fd3b078
[Metric] Small Fixes
simon-mo Jan 3, 2018
bc55071
[Metric] Format Code
simon-mo Jan 3, 2018
5efc602
[Metric] Skipped Metrics to Let Jenkins Build Images
simon-mo Jan 5, 2018
d86d79a
[Metric] Revert the files; docker imgs built before test
simon-mo Jan 5, 2018
3ffd51b
Merge branch 'metrics' of https://github.com/simon-mo/clipper into me…
simon-mo Jan 5, 2018
21b3926
Move the comments; restart the tests
simon-mo Jan 5, 2018
ceb6065
[Metric] Add version tag to the frontend-exporter
simon-mo Jan 5, 2018
0922c3d
[Metric] Add clipper_metric to run_unittests.sh
simon-mo Jan 5, 2018
ebf2df6
[Metric] Trigger another CI check
simon-mo Jan 5, 2018
ccff8ae
[Metrics] A working node exporter done.
simon-mo Dec 17, 2017
6756b18
[Metric] Model Exporter Done.
simon-mo Dec 19, 2017
92d2c01
[Metric] Add FrontendExporter Docker image
simon-mo Dec 19, 2017
4fc5c10
[Metric] Add docstrings
simon-mo Dec 19, 2017
d84be8d
[Metric] Add integration test for clipper metric
simon-mo Dec 20, 2017
9e955b3
[Metric] Format Code
simon-mo Dec 20, 2017
457f792
[Metrics] Bug Fix, update the name accoridingly
simon-mo Jan 3, 2018
808935b
[Metric] Small Fixes
simon-mo Jan 3, 2018
5816ca7
[Metric] Format Code
simon-mo Jan 3, 2018
26e0024
[Metric] Skipped Metrics to Let Jenkins Build Images
simon-mo Jan 5, 2018
a93606d
[Metric] Revert the files; docker imgs built before test
simon-mo Jan 5, 2018
3dd8298
Move the comments; restart the tests
simon-mo Jan 5, 2018
56425ae
[Metric] Add version tag to the frontend-exporter
simon-mo Jan 5, 2018
e2d8f46
[Metric] Add clipper_metric to run_unittests.sh
simon-mo Jan 5, 2018
086e305
[Metric] Trigger another CI check
simon-mo Jan 5, 2018
87a67e2
Merge branch 'develop' into metrics
dcrankshaw Jan 5, 2018
b89cbae
Address comments, fix typo
simon-mo Jan 8, 2018
aa4b7c8
Merge branch 'metrics' of https://github.com/simon-mo/clipper into me…
simon-mo Jan 8, 2018
dd9d2e5
[Metric] change the redis port for integration-test
simon-mo Jan 8, 2018
b218000
Merge branch 'develop' into metrics
simon-mo Jan 10, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions bin/build_docker_images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -258,9 +258,10 @@ build_images () {
create_image pyspark-container PySparkContainerDockerfile $public
create_image tf_cifar_container TensorFlowCifarDockerfile $public
create_image tf-container TensorFlowDockerfile $public
}


# Build Metric Monitor image - no dependency
create_image frontend-exporter FrontendExporterDockerfile $public
}


usage () {
Expand Down
1 change: 1 addition & 0 deletions clipper_admin/clipper_admin/container_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
CLIPPER_INTERNAL_QUERY_PORT = 1337
CLIPPER_INTERNAL_MANAGEMENT_PORT = 1338
CLIPPER_INTERNAL_RPC_PORT = 7000
CLIPPER_INTERNAL_METRIC_PORT = 1390

CLIPPER_DOCKER_LABEL = "ai.clipper.container.label"
CLIPPER_MODEL_CONTAINER_LABEL = "ai.clipper.model_container.label"
Expand Down
36 changes: 30 additions & 6 deletions clipper_admin/clipper_admin/docker/docker_container_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@
ContainerManager, CLIPPER_DOCKER_LABEL, CLIPPER_MODEL_CONTAINER_LABEL,
CLIPPER_QUERY_FRONTEND_CONTAINER_LABEL,
CLIPPER_MGMT_FRONTEND_CONTAINER_LABEL, CLIPPER_INTERNAL_RPC_PORT,
CLIPPER_INTERNAL_QUERY_PORT, CLIPPER_INTERNAL_MANAGEMENT_PORT)
CLIPPER_INTERNAL_QUERY_PORT, CLIPPER_INTERNAL_MANAGEMENT_PORT,
CLIPPER_INTERNAL_METRIC_PORT)
from ..exceptions import ClipperException
from requests.exceptions import ConnectionError
from .docker_metric_utils import *

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -93,7 +95,7 @@ def start_clipper(self, query_frontend_image, mgmt_frontend_image,
logger.debug(
"{nw} network already exists".format(nw=self.docker_network))
except ConnectionError:
msg = "Unable to connect to Docker. Please Check if Docker is running."
msg = "Unable to Connect to Docker. Please Check if Docker is running."
raise ClipperException(msg)

if not self.external_redis:
Expand Down Expand Up @@ -123,24 +125,38 @@ def start_clipper(self, query_frontend_image, mgmt_frontend_image,
},
labels=mgmt_labels,
**self.extra_container_kwargs)

query_cmd = "--redis_ip={redis_ip} --redis_port={redis_port} --prediction_cache_size={cache_size}".format(
redis_ip=self.redis_ip,
redis_port=self.redis_port,
cache_size=cache_size)
query_labels = self.common_labels.copy()
query_labels[CLIPPER_QUERY_FRONTEND_CONTAINER_LABEL] = ""
query_name = "query_frontend-{}".format(random.randint(
0, 100000)) # generate a random name
self.docker_client.containers.run(
query_frontend_image,
query_cmd,
name="query_frontend-{}".format(
random.randint(0, 100000)), # generate a random name
name=query_name,
ports={
'%s/tcp' % CLIPPER_INTERNAL_QUERY_PORT:
self.clipper_query_port,
'%s/tcp' % CLIPPER_INTERNAL_RPC_PORT: self.clipper_rpc_port
},
labels=query_labels,
**self.extra_container_kwargs)

# Metric Section
query_frontend_metric_name = "query_frontend_exporter-{}".format(
random.randint(0, 100000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the same random integer as the query_frontend_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

run_query_frontend_metric_image(
query_frontend_metric_name, self.docker_client, query_name,
self.common_labels, self.extra_container_kwargs)
setup_metric_config(query_frontend_metric_name,
CLIPPER_INTERNAL_METRIC_PORT)
run_metric_image(self.docker_client, self.common_labels,
self.extra_container_kwargs)

self.connect()

def connect(self):
Expand Down Expand Up @@ -187,15 +203,23 @@ def _add_replica(self, name, version, input_type, image):
"CLIPPER_IP": query_frontend_hostname,
"CLIPPER_INPUT_TYPE": input_type,
}

model_container_label = create_model_container_label(name, version)
labels = self.common_labels.copy()
labels[CLIPPER_MODEL_CONTAINER_LABEL] = create_model_container_label(
name, version)
labels[CLIPPER_MODEL_CONTAINER_LABEL] = model_container_label

model_container_name = model_container_label + '-{}'.format(
random.randint(0, 100000))
self.docker_client.containers.run(
image,
name=model_container_name,
environment=env_vars,
labels=labels,
**self.extra_container_kwargs)

update_metric_config(model_container_name,
CLIPPER_INTERNAL_METRIC_PORT)

def set_num_replicas(self, name, version, input_type, image, num_replicas):
current_replicas = self._get_replicas(name, version)
if len(current_replicas) < num_replicas:
Expand Down
140 changes: 140 additions & 0 deletions clipper_admin/clipper_admin/docker/docker_metric_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
import yaml
import requests
import random
import os


def ensure_clipper_tmp():
"""
Make sure /tmp/clipper directory exist. If not, make one.
:return: None
"""
try:
os.makedirs('/tmp/clipper')
except OSError as e:
# Equivalent to os.makedirs(., exist_ok=True) in py3
pass


def get_prometheus_base_config():
"""
Generate a basic configuration dictionary for prometheus
:return: dictionary
"""
conf = dict()
conf['global'] = {'evaluation_interval': '5s', 'scrape_interval': '5s'}
conf['scrape_configs'] = []
return conf


def run_query_frontend_metric_image(name, docker_client, query_name,
common_labels, extra_container_kwargs):
"""
Use docker_client to run a frontend-exporter image.
:param name: Name to pass in, need to be unique.
:param docker_client: The docker_client object.
:param query_name: The corresponding frontend name
:param common_labels: Labels to pass in.
:param extra_container_kwargs: Kwargs to pass in.
:return: None
"""

query_frontend_metric_cmd = "--query_frontend_name {}".format(query_name)
query_frontend_metric_labels = common_labels.copy()

docker_client.containers.run(
"clipper/frontend-exporter",
query_frontend_metric_cmd,
name=name,
labels=query_frontend_metric_labels,
**extra_container_kwargs)


def setup_metric_config(query_frontend_metric_name,
CLIPPER_INTERNAL_METRIC_PORT):
"""
Write to file prometheus.yml after frontend-metric is setup.
:param query_frontend_metric_name: Corresponding image name
:param CLIPPER_INTERNAL_METRIC_PORT: Default port.
:return: None
"""

ensure_clipper_tmp()

with open('/tmp/clipper/prometheus.yml', 'w') as f:
prom_config = get_prometheus_base_config()
prom_config_query_frontend = {
'job_name':
'query',
'static_configs': [{
'targets': [
'{name}:{port}'.format(
name=query_frontend_metric_name,
port=CLIPPER_INTERNAL_METRIC_PORT)
]
}]
}
prom_config['scrape_configs'].append(prom_config_query_frontend)

yaml.dump(prom_config, f)


def run_metric_image(docker_client, common_labels, extra_container_kwargs):
"""
Run the prometheus image.
:param docker_client: The docker client object
:param common_labels: Labels to pass in
:param extra_container_kwargs: Kwargs to pass in.
:return: None
"""

metric_cmd = [
"--config.file=/etc/prometheus/prometheus.yml",
"--storage.tsdb.path=/prometheus",
"--web.console.libraries=/etc/prometheus/console_libraries",
"--web.console.templates=/etc/prometheus/consoles",
"--web.enable-lifecycle"
]
metric_labels = common_labels.copy()
docker_client.containers.run(
"prom/prometheus",
metric_cmd,
name="metric_frontend-{}".format(random.randint(0, 100000)),
ports={'9090/tcp': 9090},
volumes={
'/tmp/clipper/prometheus.yml': {
'bind': '/etc/prometheus/prometheus.yml',
'mode': 'ro'
}
},
labels=metric_labels,
**extra_container_kwargs)


def update_metric_config(model_container_name, CLIPPER_INTERNAL_METRIC_PORT):
"""
Update the prometheus.yml configuration file.
:param model_container_name: New model container_name, need to be unique.
:param CLIPPER_INTERNAL_METRIC_PORT: Default port
:return: None
"""
with open('/tmp/clipper/prometheus.yml', 'r') as f:
conf = yaml.load(f)

new_job_dict = {
'job_name':
'{}'.format(model_container_name),
'static_configs': [{
'targets': [
'{name}:{port}'.format(
name=model_container_name,
port=CLIPPER_INTERNAL_METRIC_PORT)
]
}]
}
conf['scrape_configs'].append(new_job_dict)

with open('/tmp/clipper/prometheus.yml', 'w') as f:
yaml.dump(conf, f)

requests.post('http://localhost:9090/-/reload')
1 change: 1 addition & 0 deletions clipper_admin/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ docker==2.5.1
kubernetes==3.0.0
six==1.10.0
mock
prometheus_client
1 change: 1 addition & 0 deletions clipper_admin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
'pyyaml',
'docker',
'kubernetes',
'prometheus_client',
'six',
],
extras_require={
Expand Down
69 changes: 69 additions & 0 deletions containers/python/front_end_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containers directory is for stuff related to model containers. Can you create a separate monitoring directory (CLIPPER_ROOT/monitoring) and put this file in it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also put a short README in the monitoring directory that provides instructions on how to access the Prometheus server once it is up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! This is a much better idea.

from flatten_json import flatten
import itertools
import time
from prometheus_client import start_http_server
from prometheus_client.core import GaugeMetricFamily, REGISTRY
import argparse

parser = argparse.ArgumentParser(
description='Spin up a node exporter for query_frontend.')
parser.add_argument(
'--query_frontend_name',
metavar='str',
type=str,
required=True,
help='The name of docker container in clipper_network')
args = parser.parse_args()

query_frontend_id = args.query_frontend_name

ADDRESS = 'http://{}:1337/metrics'.format(query_frontend_id) #Sub with name


def load_metric():
res = requests.get(ADDRESS)
return res.json()


def multi_dict_unpacking(lst):
"""
Receive a list of dictionaries, join them into one big dictionary
"""
result = {}
for d in lst:
result = {**result, **d}
return result


def parse_metric(metrics):
wo_type = list(itertools.chain.from_iterable(metrics.values()))
wo_type_flattened = list(itertools.chain([flatten(d) for d in wo_type]))
wo_type_joined = multi_dict_unpacking(wo_type_flattened)
return wo_type_joined


class ClipperCollector(object):
def __init__(self):
pass

def collect(self):
metrics = parse_metric(load_metric())

for name, val in metrics.items():
try:
if '.' or 'e' in val:
val = float(val)
else:
val = int(val)
name = name.replace(':', '_').replace('-', '_')
yield GaugeMetricFamily(name, 'help', value=val)
except ValueError:
pass


if __name__ == '__main__':
REGISTRY.register(ClipperCollector())
start_http_server(1390)
while True:
time.sleep(1)
Loading