Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul.raft.replication.heartbeat metrics have many suffixes #4450

Closed
xuejipeng opened this issue Jul 26, 2018 · 15 comments · Fixed by #8822
Closed

consul.raft.replication.heartbeat metrics have many suffixes #4450

xuejipeng opened this issue Jul 26, 2018 · 15 comments · Fixed by #8822
Labels
theme/telemetry Anything related to telemetry or observability type/enhancement Proposed improvement or new feature

Comments

@xuejipeng
Copy link

my consul version is 1.2.1,here is my configuration
{
"datacenter": "dc1",
"data_dir": "/apps/consul_1.2.1/data",
"log_level": "DEBUG",
"node_name": "ast0",
"server": true,
"ui": true,
"bootstrap_expect": 1,
"bind_addr": "10.0.5.169",
"client_addr": "0.0.0.0",
"retry_join": ["10.0.5.160","10.0.5.94"],
"retry_interval": "3s",
"enable_debug": true,
"rejoin_after_leave": true,
"enable_syslog": false,
"telemetry": {
"prometheus_retention_time": "24h",
"disable_hostname": true
}
}

when i get the metrics consul.raft.replication.heartbeat it like this consul_raft_replication_appendEntries_rpc_450cf1d9_62de_d0be_905d_b4b42de3f8b8

What are these suffixes,if i get rid of it? or if i can tag some label for this metrics

@banks
Copy link
Member

banks commented Jul 26, 2018

Hey that metric comes from: https://github.com/hashicorp/raft/blob/a3fb4581fb07b16ecf1c3361580d4bdb17de9d98/replication.go#L535-L539

In our raft library. The suffix is the UUID of the raft server so you should see a metric for each server. If you already split by hostname Prometheus (normal) then you should see that each host has a single UUID suffix. (For clarity: I see "disable_hostname": true in the posted config but that is normally used because collecting agents like Prometheus already add a host label based on the target it scraped so I assume the OP does already have metrics labeled that way.)

As an immediate solution, I'm 80% sure Prometheus rewriting rules are expressive enough to remove it or turn the last part into a label today with no changes.

In general I agree this would be better as a label - the reason it's not is that our raft library pre-dates go-metrics label features since the original sinks like statsd didn't have label support.

Seems reasonable to open an Issue on raft to switch to using labels but I'm not sure how soon that will happen!

@pearkes pearkes added type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp waiting-reply Waiting on response from Original Poster or another individual in the thread labels Jul 26, 2018
@xuejipeng
Copy link
Author

@banks Sorry, i find this suffix metrics only on the consul leader, i didn't find Prometheus how to rewrite a rule,I just found the record rules.,i would appreciate it if you could teach me some best practices.

These suffixes will change after the consul restarts, and if I want to use Grafana , then consul reboot, it will not work.

@banks
Copy link
Member

banks commented Jul 27, 2018 via email

@banks banks added type/enhancement Proposed improvement or new feature theme/telemetry Anything related to telemetry or observability and removed type/question Not an "enhancement" or "bug". Please post on discuss.hashicorp waiting-reply Waiting on response from Original Poster or another individual in the thread labels Jul 27, 2018
@xuejipeng
Copy link
Author

@banks Thank you so much,I temporarily solved the problem of Grafana data collection ,here's my configuration.

metric_relabel_configs:
  - source_labels: [__name__]
    regex: '(consul_raft_replication_appendEntries_rpc)_((\w){36})((_sum)|(_count))?'
    target_label: raft_id
    replacement: '${2}'
  - source_labels: [__name__]
    regex: '(consul_raft_replication_appendEntries_rpc)_((\w){36})((_sum)|(_count))?'
    target_label: __name__
    replacement: '${1}${4}'

i have to use the regex (consul_raft_replication_appendEntries_rpc)_((\w){36})((_sum)|(_count))? Because some metrics's name like this:

consul_raft_replication_appendEntries_rpc_450cf1d9_62de_d0be_905d_b4b42de3f8b8

consul_raft_replication_appendEntries_rpc_450cf1d9_62de_d0be_905d_b4b42de3f8b8_count

consul_raft_replication_appendEntries_rpc_450cf1d9_62de_d0be_905d_b4b42de3f8b8_sum

i don't find label host,so i have to split it by raft_id, but it doesn't seem to be easy to identify.
image

Can i get the hostname of the appropriate server or get the raft_id and Server correspondence relationship ?

@banks
Copy link
Member

banks commented Jul 30, 2018 via email

@xuejipeng
Copy link
Author

@banks thanks,now, my problem is basically solved, but this metrics only on header, so the instance label value is the header hostname/IP, i think the instance label is follower hostname/IP will be better, But can I get follower's instance label in this metrics or do I have to configure the relabel? How should I configure it ?

@banks
Copy link
Member

banks commented Jul 31, 2018 via email

@xuejipeng
Copy link
Author

@banks Thank you, but my question is that my final picture of the Grafana can be like the bottom of the picture like this, these two IP addresses are follower, not leader.
image

@banks
Copy link
Member

banks commented Aug 2, 2018 via email

@xuejipeng
Copy link
Author

@banks Thank you very much for your careful answer, maybe I need two graphs.

@mvisonneau
Copy link

mvisonneau commented Aug 16, 2018

Thanks for the tip guys, I actually used the following myself in order to cover for the other ones as well, if it worth of interest to anyone :

metric_relabel_configs:
  - source_labels: [__name__]
    regex: 'consul_raft_replication_(appendEntries_rpc|appendEntries_logs|heartbeat|installSnapshot)_((\w){36})((_sum)|(_count))?'
    target_label: raft_id
    replacement: '${2}'
  - source_labels: [__name__]
    regex: 'consul_raft_replication_(appendEntries_rpc|appendEntries_logs|heartbeat|installSnapshot)_((\w){36})((_sum)|(_count))?'
    target_label: __name__
    replacement: 'consul_raft_replication_${1}${4}'

@xuejipeng
Copy link
Author

@mvisonneau you metrics rules are really cool.

@jaysoncena
Copy link

jaysoncena commented Apr 8, 2019

This was also used in my workplace so adding my feedback here as well.

I think you can make the regex shorter. From ((\w){36})((_sum)|(_count))? to (\w{36})(_sum|_count)? and the replacement to consul_raft_replication_${1}${3}. Makes it easier to understand

@mvisonneau
Copy link

Ah indeed, which would give us:

metric_relabel_configs:
  - source_labels: [__name__]
    regex: 'consul_raft_replication_(appendEntries_rpc|appendEntries_logs|heartbeat|installSnapshot)_(\w{36})(_sum|_count)?'
    target_label: raft_id
    replacement: '${2}'
  - source_labels: [__name__]
    regex: 'consul_raft_replication_(appendEntries_rpc|appendEntries_logs|heartbeat|installSnapshot)_(\w{36})(_sum|_count)?'
    target_label: __name__
    replacement: 'consul_raft_replication_${1}${3}'

I haven't tested yet though

@GMartinez-Sisti
Copy link

The relabel above works perfectly, however if you're using a ServiceMonitor configuration in Kubernetes you need to change these two keys:

source_labels -> sourceLabels
target_label -> targetLabel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/telemetry Anything related to telemetry or observability type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants