Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metric exporter to report lag as unit of time #194

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file modified .DS_Store
Binary file not shown.
10 changes: 5 additions & 5 deletions .promu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ repository:
build:
flags: -a -tags netgo
ldflags: |
-X {{repoPath}}/vendor/github.com/prometheus/common/version.Version={{.Version}}
-X {{repoPath}}/vendor/github.com/prometheus/common/version.Revision={{.Revision}}
-X {{repoPath}}/vendor/github.com/prometheus/common/version.Branch={{.Branch}}
-X {{repoPath}}/vendor/github.com/prometheus/common/version.BuildUser={{user}}@{{host}}
-X {{repoPath}}/vendor/github.com/prometheus/common/version.BuildDate={{date "20060102-15:04:05"}}
-X github.com/prometheus/common/version.Version={{.Version}}
-X github.com/prometheus/common/version.Revision={{.Revision}}
-X github.com/prometheus/common/version.Branch={{.Branch}}
-X github.com/prometheus/common/version.BuildUser={{user}}@{{host}}
-X github.com/prometheus/common/version.BuildDate={{date "20060102-15:04:05"}}
tarball:
files:
- LICENSE
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ services:
language: go

go:
- 1.9
- 1.14

after_success:
- if [ "$TRAVIS_PULL_REQUEST" == "false" ]; then
Expand Down
235 changes: 0 additions & 235 deletions Gopkg.lock

This file was deleted.

15 changes: 0 additions & 15 deletions Gopkg.toml

This file was deleted.

43 changes: 40 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ This image is configurable using different flags
| web.telemetry-path | /metrics | Path under which to expose metrics |
| log.level | info | Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal] |
| log.enable-sarama | false | Turn on Sarama logging |
| max.offsets | 1000 | Maximum number of offsets to store in the interpolation table for a partition |
| prune.interval | 30 | How frequently should the interpolation table be pruned, in seconds |

### Notes

Expand All @@ -131,7 +133,7 @@ For details on the underlying metrics please see [Apache Kafka](https://kafka.ap

**Metrics details**

| Name | Exposed informations |
| Name | Exposed information |
| --------------- | -------------------------------------- |
| `kafka_brokers` | Number of Brokers in the Kafka Cluster |

Expand All @@ -147,7 +149,7 @@ kafka_brokers 3

**Metrics details**

| Name | Exposed informations |
| Name | Exposed information |
| -------------------------------------------------- | --------------------------------------------------- |
| `kafka_topic_partitions` | Number of partitions for this Topic |
| `kafka_topic_partition_current_offset` | Current Offset of a Broker at Topic/Partition |
Expand Down Expand Up @@ -198,7 +200,7 @@ kafka_topic_partition_under_replicated_partition{partition="0",topic="__consumer

**Metrics details**

| Name | Exposed informations |
| Name | Exposed information |
| ------------------------------------ | ------------------------------------------------------------- |
| `kafka_consumergroup_current_offset` | Current Offset of a ConsumerGroup at Topic/Partition |
| `kafka_consumergroup_lag` | Current Approximate Lag of a ConsumerGroup at Topic/Partition |
Expand All @@ -215,13 +217,48 @@ kafka_consumergroup_current_offset{consumergroup="KMOffsetCache-kafka-manager-38
kafka_consumergroup_lag{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} 1
```

### Consumer Lag

**Metric Details**

| Name | Exposed information |
| ------------------------------------ | ------------------------------------------------------------- |
| `kafka_consumer_lag_millis` | Current approximation of consumer lag for a ConsumerGroup at Topic/Partition |
| `kafka_consumer_lag_extrapolation` | Indicates that a consumer group lag estimation used extrapolation |
| `kafka_consumer_lag_interpolation` | Indicates that a consumer group lag estimation used interpolation |

**Metrics output example**
```
# HELP kafka_consumer_lag_extrapolation Indicates that a consumer group lag estimation used extrapolation
# TYPE kafka_consumer_lag_extrapolation counter
kafka_consumer_lag_extrapolation{consumergroup="perf-consumer-74084",partition="0",topic="test"} 1

# HELP kafka_consumer_lag_interpolation Indicates that a consumer group lag estimation used interpolation
# TYPE kafka_consumer_lag_interpolation counter
kafka_consumer_lag_interpolation{consumergroup="perf-consumer-74084",partition="0",topic="test"} 1

# HELP kafka_consumer_lag_millis Current approximation of consumer lag for a ConsumerGroup at Topic/Partition
# TYPE kafka_consumer_lag_millis gauge
kafka_consumer_lag_millis{consumergroup="perf-consumer-74084",partition="0",topic="test"} 3.4457231197552e+10
```

Grafana Dashboard
-------

Grafana Dashboard ID: 7589, name: Kafka Exporter Overview.

For details of the dashboard please see [Kafka Exporter Overview](https://grafana.com/dashboards/7589).

Lag Estimation
-
The technique to estimate lag for a consumer group, topic, and partition is taken from the [Lightbend Kafka Lag Exporter](https://github.com/lightbend/kafka-lag-exporter).

Once the exporter starts up, sampling of the next offset to be produced begins. The interpolation table is built from these samples, and the current offset for each monitored consumer group are compared against values in the table. If an upper and lower bound for the current offset of a consumer group are in the table, the interpolation technique is used. If only an upper bound is container within the table, extrapolation is used.

At a configurable interval `prune.interval` (default is 30 seconds) an operation to prune the interpolation table is performed. Any consumer group or topic that are no longer listed by the broker is removed. The number of offsets for each partition is trimmed down to `max.offsets` (default 1000), with the oldest offsets removed first.

Pruning of the interpolation table happens on a separate thread and thread safety is ensured by a lock around the interpolation table.

Contribute
----------

Expand Down
10 changes: 10 additions & 0 deletions examples/lag_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Slow Producer and Consumer

The `lag_demo` package creates a [Kafka](https://kafka.apache.org/) producer and consumer for the purposes of demonstrating lag in a consumer group.


## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

## License
[MIT](https://choosealicense.com/licenses/mit/)
12 changes: 12 additions & 0 deletions examples/lag_demo/lag_demo.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package main

import "sync"

func main() {
wg := sync.WaitGroup{}
wg.Add(1)
go slowConsumer(wg)
wg.Add(1)
go slowProducer(wg)
wg.Wait()
}
Loading