Skip to content

Commit

Permalink
Backport of Fix two WAL metrics in docs/agent/telemetry.mdx into rele…
Browse files Browse the repository at this point in the history
…ase/1.15.x (#17682)

* backport of commit 0191cb1

* backport of commit 6c245e7

---------

Co-authored-by: josh <josh.timmons@hashicorp.com>
  • Loading branch information
hc-github-team-consul-core and josh authored Jun 13, 2023
1 parent 2c2d455 commit 75f112c
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 4 deletions.
3 changes: 3 additions & 0 deletions .changelog/17593.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
docs: fix list of telemetry metrics
```
6 changes: 2 additions & 4 deletions website/content/docs/agent/telemetry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -459,10 +459,8 @@ These metrics are used to monitor the health of the Consul servers.
| `consul.raft.leader.dispatchNumLogs` | Measures the number of logs committed to disk in a batch. | logs | gauge |
| `consul.raft.logstore.verifier.checkpoints_written` | Counts the number of checkpoint entries written to the LogStore. | checkpoints | counter |
| `consul.raft.logstore.verifier.dropped_reports` | Counts how many times the verifier routine was still busy when the next checksum came in and so verification for a range was skipped. If you see this happen, consider increasing the interval between checkpoints with [`raft_logstore.verification.interval`](/consul/docs/agent/config/config-files#raft_logstore_verification) | reports dropped | counter |
| `consul.raft.logstore.verifier.ranges_verified` | Counts the number of log ranges for which a verification report has been completed. Refer to [Monitor Raft metrics and logs for WAL
](/consul/docs/agent/wal-logstore/monitoring) for more information. | log ranges verifications | counter |
| `consul.raft.logstore.verifier.read_checksum_failures` | Counts the number of times a range of logs between two check points contained at least one disk corruption. Refer to [Monitor Raft metrics and logs for WAL
](/consul/docs/agent/wal-logstore/monitoring) for more information. | disk corruptions | counter |
| `consul.raft.logstore.verifier.ranges_verified` | Counts the number of log ranges for which a verification report has been completed. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | log ranges verifications | counter |
| `consul.raft.logstore.verifier.read_checksum_failures` | Counts the number of times a range of logs between two check points contained at least one disk corruption. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information. | disk corruptions | counter |
| `consul.raft.logstore.verifier.write_checksum_failures` | Counts the number of times a follower has a different checksum to the leader at the point where it writes to the log. This could be caused by either a disk-corruption on the leader (unlikely) or some other corruption of the log entries in-flight. | in-flight corruptions | counter |
| `consul.raft.leader.lastContact` | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease.The lease timeout is 500 ms times the [`raft_multiplier` configuration](/consul/docs/agent/config/config-files#raft_multiplier), so this telemetry value should not be getting close to that configured value, otherwise the Raft timing is marginal and might need to be tuned, or more powerful servers might be needed. See the [Server Performance](/consul/docs/install/performance) guide for more details. | ms | timer |
| `consul.raft.leader.oldestLogAge` | The number of milliseconds since the _oldest_ log in the leader's log store was written. This can be important for replication health where write rate is high and the snapshot is large as followers may be unable to recover from a restart if restoring takes longer than the minimum value for the current leader. Compare this with `consul.raft.fsm.lastRestoreDuration` and `consul.raft.rpc.installSnapshot` to monitor. In normal usage this gauge value will grow linearly over time until a snapshot completes on the leader and the log is truncated. Note: this metric won't be emitted until the leader writes a snapshot. After an upgrade to Consul 1.10.0 it won't be emitted until the oldest log was written after the upgrade. | ms | gauge |
Expand Down

0 comments on commit 75f112c

Please sign in to comment.