-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kube-state-metrics
based metricsets
#4253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to have this module in. We should have a conversation about the end result of the data model.
} | ||
} | ||
}, | ||
"status": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name implies state
and here it is status
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using kube-state-metrics
nomenclature here, check https://github.com/elastic/beats/pull/4253/files/0f5f98da5eb5acf11055a8d8cf551f7b4eb1ed0e#diff-c70b508c24452e40e23c98c944027e25R54 to see the metric family names
This is how I would explain it:
State (from kube-state-metrics) means that we get access to the global state of the cluster (think of it as a snapshot in time). While this status
metrics are one of the specific metrics of a container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we namespace this under state
it probably becomes more obvious (see data structure comments).
"resource": { | ||
"requests": { | ||
"cpu": { | ||
"nanocores": 5000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that will change over time? If we call it state_container
all these fields which are not shared should be under one namespace.
func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) { | ||
eventsMap := map[string]common.MapStr{} | ||
for _, family := range families { | ||
//fmt.Println(family) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed.
nanocores = 1000000000 | ||
) | ||
|
||
func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would kind of hope instead of adding dependency on prometheus here, that this can "depend" on the prometheus module / metricset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial intention, but I have 2 issues with that:
- Prometheus module mangles original data on it's way, and it's not really aware of the format of these metrics, it does something generic. So I need to go over the resulting MapStr again
- If we do a change in prometheus module it may break this, having that I'm not using its format
Something I thought about was to move all prometheus code to a common helper (like HTTP), WDYT about that? At the end it looked like overkill but I'm ok with doing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the helper. Can also be done in a follow up PR. But I think it's good to have it around as this will not be the last metricset using prometheus.
} | ||
|
||
default: | ||
// Ignore unknown metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like an error or at least worth logging as debug?
"scheduled": "true" | ||
} | ||
}, | ||
"state_pod": {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty namespace
@@ -0,0 +1,59 @@ | |||
package state_replicaset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we call this replicaset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep the same format for all kube-state-metrics
metricsets for one reason, the hosts
setting is incompatible between kubelet
and kube-state-metrics
, you have to choose, and it's nice to have that clear from the metricset name, to avoid confusion.
Check https://github.com/exekias/beats/blob/d091ce76089c44e5fd872661d529932aa92633df/metricbeat/module/kubernetes/_meta/config.yml for an example of possible configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point. Leads me to think it should be it's own module but we concluded this discussion :-) Don't have a better solution at the moment :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not 100% happy with this either, we can see how it goes and perhaps improve it in the future
|
||
return &MetricSet{ | ||
BaseMetricSet: base, | ||
counter: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove counter
"github.com/prometheus/common/expfmt" | ||
) | ||
|
||
func GetFamilies(http *helper.HTTP) ([]*dto.MetricFamily, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a prometheus
helper under the helper
namespace? Or can you use the prometheus.collector
code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to the helper, prometheus collector issue explained in a previous comment, but I can update it to share the helper
@@ -0,0 +1,43 @@ | |||
{ | |||
"comment": "", | |||
"ignore": "test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add github.com/elastic/beats
here as otherwise when you remove all the beats dependencies will be added to when doing an update.
func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) { | ||
eventsMap := map[string]common.MapStr{} | ||
for _, family := range families { | ||
//fmt.Println(family) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed.
|
||
logp.Beta("The kubernetes state_container metricset is experimental") | ||
|
||
if err := base.Module().UnpackConfig(&config); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be dropped since no config is used.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) { | ||
config := struct{}{} | ||
|
||
logp.Beta("The kubernetes state_pod metricset is experimental") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a logp.Experimental()
.
- name: cpu.nanocores | ||
type: long | ||
description: > | ||
Container CPU requested nanocores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is a nanocore? Nanoseconds of CPU time per core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the other way around, in this case it's number of requested (reserved) full CPU cores. So 1 Core would mean the container has 1 full core reserved for itself, as long as it's alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this metric is also present as a usage metric: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/container/_meta/fields.yml#L21
In that case it's the accumulated value over time (I use derivative in graphs to extract the used nancores per bucket)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common scenario is to check requested vs limits vs real usage. Ideally you would put some sane limits based on real usage of a given container
"github.com/prometheus/common/expfmt" | ||
) | ||
|
||
func GetFamilies(http *helper.HTTP) ([]*dto.MetricFamily, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[golint] reported by reviewdog 🐶
exported function GetFamilies should have comment or be unexported
return families, nil | ||
} | ||
|
||
func GetLabel(m *dto.Metric, label string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[golint] reported by reviewdog 🐶
exported function GetLabel should have comment or be unexported
faf700b
to
5589ec9
Compare
kube-state-metrics
based metricsetskube-state-metrics
based metricsets
5589ec9
to
047c561
Compare
metricbeat/helper/prometheus.go
Outdated
"fmt" | ||
|
||
"github.com/elastic/beats/metricbeat/mb" | ||
dto "github.com/prometheus/client_model/go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a newline should be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During this review I mainly focused on the data structure. I really like that now the output documents are organised with a focus on the consumer, means the person that builds dashboards and queries for it. It decouples configuration namespaces from the output which I think is a good thing.
Assuming we can keep the data structure I'm tempted to say in the future we could split it up again in 2 modules both not called kubernetes but both putting their data into the kubernetes namespace which is kind of an empty module that only has fields and docs, no code.
As now fields construct became not quite tricky to follow I think it is critical that we compare our data.json
output against the fields.yml
file to make sure all fields are documented.
|
||
// Prometheus helper retrieves prometheus formatted metrics | ||
type Prometheus struct { | ||
HTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, @urso will not like it too much ;-)
- module: kubernetes | ||
enabled: false | ||
metricsets: | ||
- state_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a comment from @andrewkroh in an other PR I'm hoping we can make this state.node etc. in the future (not in this PR).
|
||
func eventMapping(families []*dto.MetricFamily) ([]common.MapStr, error) { | ||
eventsMap := map[string]common.MapStr{} | ||
for _, family := range families { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed or can it be moved now to the helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's pretty custom for this case, so we need to keep it.
I want to see what future metricsets using the helper have in common, perhaps we can still extract some helpers there
1bde788
to
d7efc3a
Compare
jenkins, retest this please |
^ You're so polite to the robots 🤖 . I guess you are hoping for favorable treatment when they take over. |
d7efc3a
to
071cac3
Compare
385d5b4
to
e7c3361
Compare
6c773d8
to
fb8f972
Compare
8567fa9
to
0ec0b1d
Compare
876e39d
to
33463ae
Compare
jenkins, test it |
c80556f
to
0548bb4
Compare
4319e79
to
1519037
Compare
@@ -426,7 +426,7 @@ def extract_fields(doc_list, name): | |||
|
|||
# TODO: Make fields_doc path more generic to work with beat-generator | |||
with open(fields_doc, "r") as f: | |||
path = os.path.abspath(os.path.dirname(__file__) + "../../../../_meta/fields.common.yml") | |||
path = os.path.abspath(os.path.dirname(__file__) + "../../../../_meta/fields.generated.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the generated exist in all beats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be the reason that the appveyor builds fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a fallback to common.yml, let's see if appveyor likes that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it worked, thanks!
@@ -51,6 +52,8 @@ services: | |||
jolokia: { condition: service_healthy } | |||
kafka: { condition: service_healthy } | |||
kibana: { condition: service_healthy } | |||
kubernetes: { condition: service_healthy } | |||
kube-state: { condition: service_started } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that this one is set to started and not healthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube-state-metrics doesn't provide health service, but as soon as it starts it should work so we can live with this it seems
- state_pod | ||
- state_container | ||
period: 10s | ||
hosts: ["kube-state-metrics:8080"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have a "common" ip or dns name in a kubernetes environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
self.assertItemsEqual(self.de_dot(KUBERNETES_FIELDS), evt.keys(), evt) | ||
|
||
self.assert_fields_are_documented(evt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
1519037
to
2fa7015
Compare
This file includes fields from processors, as they are used by tests Fallbacks to `fields.common.yml` when generated is not present
2fa7015
to
1fba4ef
Compare
Add k8s metrics extracted from kube-state-metrics services. This adds info specially useful about:
I have updated existing dashboard with this kind of info