-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat module for Traefik #7413
Conversation
cc52f5b
to
8c99817
Compare
8c99817
to
e55aaff
Compare
@ruflin Looking at the
How can I fix this? @exekias Is there some VS Code extension I should use to auto-fix this on save or something? I'm currently using this in my VS Code config:
[EDIT] I tried changing that setting to |
You can use As for VS Code, try adding this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add a changelog.
It seems somehow the template loading test is failing. Not sure why, there must be something invalid in the template. If you run the test locally it should give you a good idea on what goes wrong in the logs.
var ( | ||
schema = s.Schema{ | ||
"uptime": s.Object{ | ||
"sec": c.Float("uptime_sec"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is a float in the response from Traefik's health API (http://localhost:8080/health) so I kept it the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case i would suggest we use milliseconds instead here, especially having elastic/elasticsearch#31244 in mind. TBH for uptime I would also be fine to just convert it to Int. I don't think there is a benefit of having uptime in sub second detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a benefit of having uptime in sub second detail.
Yeah, good point. Will change to Int.
- key: traefik | ||
title: "traefik" | ||
description: > | ||
experimental[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we now generate this automatically if the release setting is not set. So you can skip it here.
d97f5e5
to
a675b8b
Compare
I'm investigating the integration test failures by running them locally as you suggested. From what I can tell so far I think the failure has to do with the Docker container for Traefik not starting up correctly. Concretely, I ran this command:
And it's output got "stuck" like so:
Running
And eventually that becomes this:
In either output of My guess is that this failing Docker health check is what's causing the integration test to be stuck. So I'm playing with the Dockerfile and docker-compose.yml to see if I can get port 8080 up somehow and try the tests again. Stay tuned... |
It seems that the container tries to expose port 80 instead of 8080. Is treafik by default running on 80 or 8080? Perhaps you have change the config here and add |
Turned out to be a combination of issues in the Dockerfile, you can see the necessary changes to make the |
6b32a9b
to
ce49d1a
Compare
@@ -0,0 +1,19 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this file? How is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example output document that we should in the docs. It is generated by this command here: https://github.com/elastic/beats/blob/master/metricbeat/Makefile#L73 As this updates all data.json I normally run the following command:
go test -tags=integration github.com/elastic/beats/metricbeat/module/traefik/... -data
In you case this assumes the traefik docker container is running and exposes the port on localhost. This will generate a data.json file and make update
will link it in the docs.
Here you can see an example in the docs: https://www.elastic.co/guide/en/beats/metricbeat/master/metricbeat-metricset-elasticsearch-index.html
description: > | ||
Metrics obtained from Traefik's health API endpoint | ||
fields: | ||
- name: uptime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One trick you can do here not to have to specify the group and describe is to directly write uptime.sec
. The generators will do the rest of the magic for you.
Average response time | ||
fields: | ||
- name: sec | ||
type: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with the other sec type I struggle that we have a sec
value that is float. I understand that for the average it makes more sense but in this case I would rather use a smaller unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will use milliseconds and int here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For proxies performance monitoring it is useful to have sub-millisecond response times if available.
description: > | ||
Respone status codes | ||
fields: | ||
- name: code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is correct. If understand the code correct it will be status_codes.404: 17
? So code
is actually the key and not the value? For such cases have a look at the dynamic templates. Search for object_type
usage in our fields.yml to see how it can be used.
"avg_time": s.Object{ | ||
"sec": c.Float("average_response_time_sec"), | ||
}, | ||
"status_codes": s.Object{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not define it here but leave it empty. Like this if we don't have any status codes, no object shows up in Elasticsearch.
) | ||
|
||
// raw response copied from Traefik instance's health API endpoint | ||
const response = `{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having it to define here, I would recommend to put this into a file which is loaded. This makes it then very easy to have a list of files to test against, for example different versions. See Elasticsearch module as an example.
COMPOSE_SERVICES = ['traefik'] | ||
|
||
@unittest.skipUnless(metricbeat.INTEGRATION_TESTS, "integration test") | ||
def test_health(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a new method check_metricset
in metricbeat.py. Using it could simplify your code here.
@@ -0,0 +1,36 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently started to put this file directly into the module directory (see Elasticsearch) to have all pieces in one place. But still both works.
CHANGELOG.asciidoc
Outdated
@@ -119,6 +119,7 @@ https://github.com/elastic/beats/compare/v6.2.3...master[Check the HEAD diff] | |||
- Fix field mapping for the system process CPU ticks fields. {pull}7230[7230] | |||
- Fix Windows service metricset when using a 32-bit binary on a 64-bit OS. {pull}7294[7294] | |||
- Fix Jolokia attribute mapping when using wildcards and MBean names with multiple properties. {pull}7321[7321] | |||
- Added Traefik module with health metricset. {pull}7413[7413] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this entry to the Added
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch. Thanks!
Average response time | ||
fields: | ||
- name: sec | ||
type: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For proxies performance monitoring it is useful to have sub-millisecond response times if available.
|
||
event := reporter.GetEvents()[0] | ||
assert.NotNil(t, event) | ||
t.Logf("%s/%s event: %+v", ms.Module().Name(), ms.Name(), event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also check something in the event fields?
ce49d1a
to
0b917f4
Compare
bafb1b8
to
bed5d3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a missing import in a test file, for the rest it LGTM :)
import time | ||
from parameterized import parameterized | ||
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '../../tests/system')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import sys
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahahaha, thank you.
Related: Is there some way to run system tests (i.e. the Python ones) for a specific module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INTEGRATION_TESTS=1 nosetests module/traefik/
should do the trick. But you must have traefik running locally with docker exposing the port.
|
||
event, _ := schema.Apply(health) | ||
|
||
statusCodeCountMap := health["total_status_code_count"].(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should check that the key exists and that the type assertion worked. you can add a second param ok
which will contain a bool to do the second part.
type: long | ||
description: > | ||
Average response time in microseconds | ||
- name: status_codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it status_code
because when someone access it in Kibana he will only look at 1 at the time:
health.status_code.404 > 17
|
||
event, _ := eventMapping(data) | ||
report.Event(mb.Event{ | ||
MetricSetFields: event, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow ECS, you could you add service.name: traefik
to it?
metricbeat/module/traefik/testing.go
Outdated
@@ -0,0 +1,49 @@ | |||
// Licensed to Elasticsearch B.V. under one or more contributor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsoriano Should these now go under mtest
package in each module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I'd prefer that, yes.
@ruflin Made the changes you suggested in the latest round of review. Ready for your eyes again, when you get a chance. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor additional comments. Except for the float64 issue ok for me to be merged.
"github.com/pkg/errors" | ||
|
||
"github.com/elastic/beats/libbeat/common" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline can be removed
}.Build() | ||
) | ||
|
||
// MetricSet holds any configuration or state information. It must implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often remove the generator comments and make them more specific to the metricset or just very short. The comments were intended for someone that creates the metricset. Applies also to all other comments.
|
||
func eventMapping(health map[string]interface{}) (common.MapStr, *s.Errors) { | ||
if averageResponseTimeSec, ok := health["average_response_time_sec"]; ok { | ||
health["average_response_time_us"] = averageResponseTimeSec.(float64) * 1000 * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if for whatever reason averageResponseTimeSec
cannot be typed to (float64)
@@ -0,0 +1,7 @@ | |||
# Module: traefik | |||
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/master/metricbeat-module-traefik.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the link to the docs. One issue this has is that we hardcode
master here to every released config file will point to master. Would be nice if in the future this would be a variable that is replaced by the version during packaging.
@ruflin Addressed latest round of review feedback. Ready for your 👀 again. Thanks! |
@ycombinator Thanks for pushing this forward. |
EDIT: This backport commit was later removed. |
Resolves #7412.
This PR adds a Metricbeat module for the Traefik reverse proxy/load balancer. In this module, it adds the health metricset, whose data it retrieves from polling the Traefik instance's health HTTP API endpoint.