-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics Monitoring: A minimal viable product #348
Merged
Merged
Changes from 35 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
6d5f9e2
[Metrics] A working node exporter done.
simon-mo 439d9f8
[Metric] Model Exporter Done.
simon-mo 3166bc0
[Metric] Add FrontendExporter Docker image
simon-mo 8fbba2f
[Metric] Add docstrings
simon-mo 7947a5f
[Metric] Add integration test for clipper metric
simon-mo 79f3145
[Metric] Format Code
simon-mo e1c7227
[Metrics] Bug Fix, update the name accoridingly
simon-mo fd3b078
[Metric] Small Fixes
simon-mo bc55071
[Metric] Format Code
simon-mo 5efc602
[Metric] Skipped Metrics to Let Jenkins Build Images
simon-mo d86d79a
[Metric] Revert the files; docker imgs built before test
simon-mo 3ffd51b
Merge branch 'metrics' of https://github.com/simon-mo/clipper into me…
simon-mo 21b3926
Move the comments; restart the tests
simon-mo ceb6065
[Metric] Add version tag to the frontend-exporter
simon-mo 0922c3d
[Metric] Add clipper_metric to run_unittests.sh
simon-mo ebf2df6
[Metric] Trigger another CI check
simon-mo ccff8ae
[Metrics] A working node exporter done.
simon-mo 6756b18
[Metric] Model Exporter Done.
simon-mo 92d2c01
[Metric] Add FrontendExporter Docker image
simon-mo 4fc5c10
[Metric] Add docstrings
simon-mo d84be8d
[Metric] Add integration test for clipper metric
simon-mo 9e955b3
[Metric] Format Code
simon-mo 457f792
[Metrics] Bug Fix, update the name accoridingly
simon-mo 808935b
[Metric] Small Fixes
simon-mo 5816ca7
[Metric] Format Code
simon-mo 26e0024
[Metric] Skipped Metrics to Let Jenkins Build Images
simon-mo a93606d
[Metric] Revert the files; docker imgs built before test
simon-mo 3dd8298
Move the comments; restart the tests
simon-mo 56425ae
[Metric] Add version tag to the frontend-exporter
simon-mo e2d8f46
[Metric] Add clipper_metric to run_unittests.sh
simon-mo 086e305
[Metric] Trigger another CI check
simon-mo 87a67e2
Merge branch 'develop' into metrics
dcrankshaw b89cbae
Address comments, fix typo
simon-mo aa4b7c8
Merge branch 'metrics' of https://github.com/simon-mo/clipper into me…
simon-mo dd9d2e5
[Metric] change the redis port for integration-test
simon-mo b218000
Merge branch 'develop' into metrics
simon-mo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
141 changes: 141 additions & 0 deletions
141
clipper_admin/clipper_admin/docker/docker_metric_utils.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
import yaml | ||
import requests | ||
import random | ||
import os | ||
from ..version import __version__ | ||
|
||
|
||
def ensure_clipper_tmp(): | ||
""" | ||
Make sure /tmp/clipper directory exist. If not, make one. | ||
:return: None | ||
""" | ||
try: | ||
os.makedirs('/tmp/clipper') | ||
except OSError as e: | ||
# Equivalent to os.makedirs(., exist_ok=True) in py3 | ||
pass | ||
|
||
|
||
def get_prometheus_base_config(): | ||
""" | ||
Generate a basic configuration dictionary for prometheus | ||
:return: dictionary | ||
""" | ||
conf = dict() | ||
conf['global'] = {'evaluation_interval': '5s', 'scrape_interval': '5s'} | ||
conf['scrape_configs'] = [] | ||
return conf | ||
|
||
|
||
def run_query_frontend_metric_image(name, docker_client, query_name, | ||
common_labels, extra_container_kwargs): | ||
""" | ||
Use docker_client to run a frontend-exporter image. | ||
:param name: Name to pass in, need to be unique. | ||
:param docker_client: The docker_client object. | ||
:param query_name: The corresponding frontend name | ||
:param common_labels: Labels to pass in. | ||
:param extra_container_kwargs: Kwargs to pass in. | ||
:return: None | ||
""" | ||
|
||
query_frontend_metric_cmd = "--query_frontend_name {}".format(query_name) | ||
query_frontend_metric_labels = common_labels.copy() | ||
|
||
docker_client.containers.run( | ||
"clipper/frontend-exporter:{}".format(__version__), | ||
query_frontend_metric_cmd, | ||
name=name, | ||
labels=query_frontend_metric_labels, | ||
**extra_container_kwargs) | ||
|
||
|
||
def setup_metric_config(query_frontend_metric_name, | ||
CLIPPER_INTERNAL_METRIC_PORT): | ||
""" | ||
Write to file prometheus.yml after frontend-metric is setup. | ||
:param query_frontend_metric_name: Corresponding image name | ||
:param CLIPPER_INTERNAL_METRIC_PORT: Default port. | ||
:return: None | ||
""" | ||
|
||
ensure_clipper_tmp() | ||
|
||
with open('/tmp/clipper/prometheus.yml', 'w') as f: | ||
prom_config = get_prometheus_base_config() | ||
prom_config_query_frontend = { | ||
'job_name': | ||
'query', | ||
'static_configs': [{ | ||
'targets': [ | ||
'{name}:{port}'.format( | ||
name=query_frontend_metric_name, | ||
port=CLIPPER_INTERNAL_METRIC_PORT) | ||
] | ||
}] | ||
} | ||
prom_config['scrape_configs'].append(prom_config_query_frontend) | ||
|
||
yaml.dump(prom_config, f) | ||
|
||
|
||
def run_metric_image(docker_client, common_labels, extra_container_kwargs): | ||
""" | ||
Run the prometheus image. | ||
:param docker_client: The docker client object | ||
:param common_labels: Labels to pass in | ||
:param extra_container_kwargs: Kwargs to pass in. | ||
:return: None | ||
""" | ||
|
||
metric_cmd = [ | ||
"--config.file=/etc/prometheus/prometheus.yml", | ||
"--storage.tsdb.path=/prometheus", | ||
"--web.console.libraries=/etc/prometheus/console_libraries", | ||
"--web.console.templates=/etc/prometheus/consoles", | ||
"--web.enable-lifecycle" | ||
] | ||
metric_labels = common_labels.copy() | ||
docker_client.containers.run( | ||
"prom/prometheus", | ||
metric_cmd, | ||
name="metric_frontend-{}".format(random.randint(0, 100000)), | ||
ports={'9090/tcp': 9090}, | ||
volumes={ | ||
'/tmp/clipper/prometheus.yml': { | ||
'bind': '/etc/prometheus/prometheus.yml', | ||
'mode': 'ro' | ||
} | ||
}, | ||
labels=metric_labels, | ||
**extra_container_kwargs) | ||
|
||
|
||
def update_metric_config(model_container_name, CLIPPER_INTERNAL_METRIC_PORT): | ||
""" | ||
Update the prometheus.yml configuration file. | ||
:param model_container_name: New model container_name, need to be unique. | ||
:param CLIPPER_INTERNAL_METRIC_PORT: Default port | ||
:return: None | ||
""" | ||
with open('/tmp/clipper/prometheus.yml', 'r') as f: | ||
conf = yaml.load(f) | ||
|
||
new_job_dict = { | ||
'job_name': | ||
'{}'.format(model_container_name), | ||
'static_configs': [{ | ||
'targets': [ | ||
'{name}:{port}'.format( | ||
name=model_container_name, | ||
port=CLIPPER_INTERNAL_METRIC_PORT) | ||
] | ||
}] | ||
} | ||
conf['scrape_configs'].append(new_job_dict) | ||
|
||
with open('/tmp/clipper/prometheus.yml', 'w') as f: | ||
yaml.dump(conf, f) | ||
|
||
requests.post('http://localhost:9090/-/reload') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,3 +5,4 @@ docker==2.5.1 | |
kubernetes==3.0.0 | ||
six==1.10.0 | ||
mock | ||
prometheus_client |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,7 @@ | |
'pyyaml', | ||
'docker', | ||
'kubernetes', | ||
'prometheus_client', | ||
'six', | ||
], | ||
extras_require={ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,9 @@ | |
import socket | ||
import sys | ||
from collections import deque | ||
from multiprocessing import Pipe, Process | ||
from prometheus_client import start_http_server | ||
from prometheus_client.core import GaugeMetricFamily, REGISTRY | ||
|
||
INPUT_TYPE_BYTES = 0 | ||
INPUT_TYPE_INTS = 1 | ||
|
@@ -180,7 +183,7 @@ def get_prediction_function(self): | |
def get_event_history(self): | ||
return self.event_history.get_events() | ||
|
||
def run(self): | ||
def run(self, metric_conn): | ||
print("Serving predictions for {0} input type.".format( | ||
input_type_to_string(self.model_input_type))) | ||
connected = False | ||
|
@@ -189,6 +192,9 @@ def run(self): | |
poller = zmq.Poller() | ||
sys.stdout.flush() | ||
sys.stderr.flush() | ||
|
||
pred_metric = dict(model_pred_count=0) | ||
|
||
while True: | ||
socket = self.context.socket(zmq.DEALER) | ||
poller.register(socket, zmq.POLLIN) | ||
|
@@ -306,6 +312,10 @@ def run(self): | |
|
||
response.send(socket, self.event_history) | ||
|
||
pred_metric['model_pred_count'] += 1 | ||
|
||
metric_conn.send(pred_metric) | ||
|
||
print("recv: %f us, parse: %f us, handle: %f us" % | ||
((t2 - t1).microseconds, (t3 - t2).microseconds, | ||
(t4 - t3).microseconds)) | ||
|
@@ -488,4 +498,40 @@ def start(self, model, host, port, model_name, model_version, input_type): | |
self.server.model_version = model_version | ||
self.server.model_input_type = model_input_type | ||
self.server.model = model | ||
self.server.run() | ||
|
||
parent_conn, child_conn = Pipe(duplex=True) | ||
metrics_proc = Process(target=run_metric, args=(child_conn, )) | ||
metrics_proc.start() | ||
self.server.run(parent_conn) | ||
|
||
|
||
class MetricCollector: | ||
def __init__(self, pipe_child_conn): | ||
self.pipe_conn = pipe_child_conn | ||
|
||
def collect(self): | ||
latest_metric_dict = None | ||
while self.pipe_conn.poll(): | ||
latest_metric_dict = self.pipe_conn.recv() | ||
if latest_metric_dict: | ||
for name, val in latest_metric_dict.items(): | ||
try: | ||
yield GaugeMetricFamily( | ||
name=name, | ||
documentation=name, # Required Argument | ||
value=val) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/lastest/latest There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
except ValueError: | ||
pass | ||
|
||
|
||
def run_metric(child_conn): | ||
""" | ||
This function takes a child_conn at the end of the pipe and | ||
receive object to update prometheus metric. | ||
|
||
It is recommended to be ran in a separate process. | ||
""" | ||
REGISTRY.register(MetricCollector(child_conn)) | ||
start_http_server(1390) | ||
while True: | ||
time.sleep(1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This ARG isn't used but prevents warnings in the build script | ||
ARG CODE_VERSION | ||
FROM python:3 | ||
|
||
WORKDIR /usr/src/app | ||
|
||
RUN pip install requests prometheus_client flatten_json | ||
|
||
COPY monitoring/front_end_exporter.py . | ||
ENTRYPOINT ["python", "./front_end_exporter.py"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're accumulating the counts here, but just sending samples of the latencies. This is why I was confused about how you were dequeuing from the pipe. In order to handle the latencies correctly you should be creating histograms. For this MVP PR, let's just accumulate the count. So delete the
*_time
metrics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I was thinking about this recently. I deleted
*_time
metrics.If we can finalize (at least for now) all the metrics needed to be collected, I will initialize them in the prometheus collector process just once [1] and run update through each dictionary in the pipe. So that every metric record is updated in the metric process (especially matters for histogram and summary data type.
[1]: the current process initialize a new
gauge
every time the metric is updated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's definitely the right way to do it. Let's merge this PR, then you can start on one that tracks the timing metrics with histograms.