Skip to content

Commit

Permalink
Merge pull request #2568 from Kavinjsir/feat/docs-metrics
Browse files Browse the repository at this point in the history
📖 Add metrics references.
  • Loading branch information
k8s-ci-robot authored Mar 30, 2022
2 parents 6d59caf + b5b2d35 commit c0a0bb6
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 11 deletions.
5 changes: 4 additions & 1 deletion docs/book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@
- [Configuring EnvTest](./reference/envtest.md)

- [Metrics](./reference/metrics.md)

- [Reference](./reference/metrics-reference.md)

- [Makefile Helpers](./reference/makefile-helpers.md)
- [Project config](./reference/project-config.md)

Expand All @@ -112,4 +115,4 @@
[Appendix: The TODO Landing Page](./TODO.md)


[plugins]: ./plugins/plugins.md
[plugins]: ./plugins/plugins.md
22 changes: 22 additions & 0 deletions docs/book/src/reference/metrics-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Default Exported Metrics References

Following the metrics which are exported and provided by [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) by default:

| Metrics name | Type | Description |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [workqueue_depth](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L41) | Gauge | Current depth of workqueue. |
| [workqueue_adds_total](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L47) | Counter | Total number of adds handled by workqueue. |
| [workqueue_queue_duration_seconds](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L53) | Histogram | How long in seconds an item stays in workqueue before being requested. |
| [workqueue_work_duration_seconds](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L60) | Histogram | How long in seconds processing an item from workqueue takes. |
| [workqueue_unfinished_work_seconds](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L67) | Gauge | How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
| [workqueue_longest_running_processor_seconds](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L76) | Gauge | How many seconds has the longest running processor for workqueue been running. |
| [workqueue_retries_total](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/workqueue.go#L83) | Counter | Total number of retries handled by workqueue. |
| [rest_client_requests_total ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/metrics/client_go_adapter.go#L79) | Counter | Number of HTTP requests, partitioned by status code, method, and host. |
| [controller_runtime_reconcile_total ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/internal/controller/metrics/metrics.go#L30) | Counter | Total number of reconciliations per controller. |
| [controller_runtime_reconcile_errors_total ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/internal/controller/metrics/metrics.go#L37) | Counter | Total number of reconciliation errors per controller. |
| [controller_runtime_reconcile_time_seconds ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/internal/controller/metrics/metrics.go#L44) | Histogram | Length of time per reconciliation per controller. |
| [controller_runtime_max_concurrent_reconciles ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/internal/controller/metrics/metrics.go#L53) | Gauge | Maximum number of concurrent reconciles per controller. |
| [controller_runtime_active_workers ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/internal/controller/metrics/metrics.go#L60) | Gauge | Number of currently used workers per controller. |
| [controller_runtime_webhook_latency_seconds ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/webhook/internal/metrics/metrics.go#L31) | Histogram | Histogram of the latency of processing admission requests. |
| [controller_runtime_webhook_requests_total ](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/webhook/internal/metrics/metrics.go#L40) | Counter | Total number of admission requests by HTTP status code. |
| [controller_runtime_webhook_requests_in_flight](https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.0/pkg/webhook/internal/metrics/metrics.go#L51) | Gauge | Current number of admission requests being served. |
21 changes: 11 additions & 10 deletions docs/book/src/reference/metrics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Metrics

By default, controller-runtime builds a global prometheus registry and
publishes a collection of performance metrics for each controller.
publishes [a collection of performance metrics](/reference/metrics-reference.md) for each controller.

## Protecting the Metrics

Expand All @@ -12,9 +12,10 @@ can be found at `config/rbac/auth_proxy_client_clusterrole.yaml`.
You will need to grant permissions to your Prometheus server so that it can
scrape the protected metrics. To achieve that, you can create a
`clusterRoleBinding` to bind the `clusterRole` to the service account that your
Prometheus server uses. If you are using `kube-prometheus`, this cluster binding already exists.
Prometheus server uses. If you are using [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus),
this cluster binding already exists.

You can either run the following command, or apply the example yaml file provided below to create `clusterRoleBinding`.
You can either run the following command, or apply the example yaml file provided below to create `clusterRoleBinding`.

If using kubebuilder
`<project-prefix>` is the `namePrefix` field in `config/default/kustomization.yaml`.
Expand All @@ -24,6 +25,7 @@ kubectl create clusterrolebinding metrics --clusterrole=<project-prefix>-metrics
```

You can also apply the following `ClusterRoleBinding`:

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
Expand All @@ -38,18 +40,19 @@ subjects:
name: <prometheus-service-account>
namespace: <prometheus-service-account-namespace>
```
The `prometheus-k8s-role` referenced here should provide the necessary permissions to allow prometheus scrape metrics from operator pods.

## Exporting Metrics for Prometheus

Follow the steps below to export the metrics using the Prometheus Operator:

1. Install Prometheus and Prometheus Operator.
We recommend using [kube-prometheus](https://github.com/coreos/kube-prometheus#installing)
in production if you don't have your own monitoring system.
If you are just experimenting, you can only install Prometheus and Prometheus Operator.
We recommend using [kube-prometheus](https://github.com/coreos/kube-prometheus#installing)
in production if you don't have your own monitoring system.
If you are just experimenting, you can only install Prometheus and Prometheus Operator.
2. Uncomment the line `- ../prometheus` in the `config/default/kustomization.yaml`.
It creates the `ServiceMonitor` resource which enables exporting the metrics.
It creates the `ServiceMonitor` resource which enables exporting the metrics.

```yaml
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
Expand Down Expand Up @@ -116,13 +119,11 @@ reconcile loop. These metrics can be evaluated from anywhere in the operator cod
<aside class="note">
<h2>Enabling metrics in Prometheus UI</h1>

In order to publish metrics and view them on the Prometheus UI, the Prometheus instance would have to be configured to select the Service Monitor instance based on its labels.
In order to publish metrics and view them on the Prometheus UI, the Prometheus instance would have to be configured to select the Service Monitor instance based on its labels.

</aside>

Those metrics will be available for prometheus or
other openmetrics systems to scrape.

![Screen Shot 2021-06-14 at 10 15 59 AM](https://user-images.githubusercontent.com/37827279/121932262-8843cd80-ccf9-11eb-9c8e-98d0eda80169.png)


3 changes: 3 additions & 0 deletions docs/book/src/reference/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,8 @@
- [Artifacts](artifacts.md)
- [Writing controller tests](writing-tests.md)
- [Metrics](metrics.md)

- [Reference](metrics-reference.md)

- [Makefile Helpers](makefile-helpers.md)
- [CLI plugins](../plugins/cli-plugins.md)

0 comments on commit c0a0bb6

Please sign in to comment.