-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring prometheus module to aggregate metrics based on metric family #4075
Conversation
7a9e535
to
974a687
Compare
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run. |
1 similar comment
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice addition. It will heavily improve the collector
metricset and create much fewer events. By using the prometheus client we also have kind of a guarantee that it keeps working.
CHANGELOG.asciidoc
Outdated
@@ -78,6 +78,8 @@ https://github.com/elastic/beats/compare/v5.1.1...master[Check the HEAD diff] | |||
- Make system process metricset honor the cpu_ticks config option. {issue}3590[3590] | |||
- Support common.Time in mapstriface.toTime() {pull}3812[3812] | |||
- Fixing panic on prometheus collector when label has , {pull}3947[3947] | |||
- Fixing prometheus collector to aggregate metrics based on metric family. {pull}4075[4075] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not put this under bugfixes but breaking changes as it changes the data structure.
@@ -6,6 +6,9 @@ import ( | |||
"github.com/elastic/beats/metricbeat/helper" | |||
"github.com/elastic/beats/metricbeat/mb" | |||
"github.com/elastic/beats/metricbeat/mb/parse" | |||
|
|||
"fmt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move the fmt on the top to have the standard imports together?
} | ||
|
||
eventList[promEvent.labelHash][promEvent.key] = promEvent.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the promEvent.key are quite constant over time so we don't have a field explosion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
labels: common.MapStr{ | ||
"handler": "query", | ||
"quantile": 0.99, | ||
key: "http_request_duration_microseconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future step we could also extract the "units" part like microseconds
as I assume this is also part of the convention.
"github.com/elastic/beats/libbeat/common" | ||
dto "github.com/prometheus/client_model/go" | ||
"math" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move these 2 to the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our general logic is for imports:
standard imports
beats imports
external imports
for _, metric := range metrics { | ||
event := PromEvent{ | ||
key: name, | ||
labelHash: "#", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that exactly work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sure that all metrics that dont have any tag values gets grouped into a single document. its a carry over from the previous implementation that was there before this change.
key := strconv.FormatFloat((100 * quantile.GetQuantile()), 'f', -1, 64) | ||
|
||
if math.IsNaN(quantile.GetValue()) == false { | ||
percentileMap["p"+key] = quantile.GetValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is already under percentile
I don't think we ned to add a p
in front of the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
if len(percentileMap) != 0 { | ||
value["percentiles"] = percentileMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make it percentile
as then it reads precentile.99: 0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bucketMap[key] = bucket.GetCumulativeCount() | ||
} | ||
|
||
value["buckets"] = bucketMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, lets make it bucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
import ( | ||
"fmt" | ||
dto "github.com/prometheus/client_model/go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see import rules above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -13,11 +13,13 @@ | |||
}, | |||
"prometheus": { | |||
"collector": { | |||
"label": { | |||
"labels": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have here label
even though I'm aware this is not consitent with most of the other label fields with have. They should also be singular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
52b9a68
to
f45e386
Compare
f45e386
to
a5ee39f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @tsg could you also have a look at this one as I remember we had quite a few discussions about the prometheus format ...
@@ -79,6 +80,7 @@ https://github.com/elastic/beats/compare/v5.1.1...master[Check the HEAD diff] | |||
- Support common.Time in mapstriface.toTime() {pull}3812[3812] | |||
- Fixing panic on prometheus collector when label has , {pull}3947[3947] | |||
- Fix MongoDB dbstats fields mapping. {pull}4025[4025] | |||
- Fixing prometheus collector to aggregate metrics based on metric family. {pull}4075[4075] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should now be removed.
The current prometheus collector implementation separates each metric into a separate event. This is not how prometheus is meant to be understood. Prometheus has the concept of MetricFamily.
Example of a metric family would be:
This metric family can be broken down as:
metric_name: apiserver_request_latencies
metric_type: histogram
It has two different label combinations:
{resource="accounts",verb="LIST"}
and{resource="accounts",verb="GET"}
Hence two events ought to be created. one of which would look similar to:
This enables histograms and summaries to be looked adjacent to each other.