-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/prometheusoperator] Add base structure #6344
Changes from all commits
c24b116
ddc7dfd
82dca05
7934f81
d32b3ed
f7f46b8
d1c07cc
41a3a3d
5cc483b
cfd36b7
cd6f61c
a4d8fbc
eef7908
ea830d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
## Design Goals | ||
|
||
### Vision | ||
Provide a nearly zero-config migration path from PrometheusOperator defined scraping configs. | ||
|
||
### Scope | ||
Only scrape config relevant CRDs (CustomResourceDefinition) are in scope of this receiver. | ||
As of this writing the relevant CRDs are: | ||
- ServiceMonitor | ||
- PodMonitor | ||
|
||
The receiver should be able to query this CRDs from Kubernetes, attach watchers and reconfigure its scraping config | ||
based on existing CRs (CustomResource) in Kubernetes. | ||
|
||
Changes as well as creations or deletions of said CRs should trigger reconciliation of the scrape config. | ||
Active querying of the API should only occur during startup and in a defined interval of the receiver. | ||
|
||
Namespace limitations should be possible for shared cluster scenarios. | ||
This should be implemented using industry standard methods. | ||
|
||
|
||
### Out of scope | ||
Excluded are other relevant CRDs such as: | ||
- PrometheusRules | ||
- ThanosRuler | ||
- Probes | ||
- Alertmanagers | ||
- AlertmanagerConfigs | ||
|
||
This includes concepts like alerting and black box probing of defined targets. | ||
|
||
|
||
## PrometheusOperator CR watcher | ||
The CR watcher is the component responsible for querying CRs from Kubernetes API and triggering a reconciliation. | ||
|
||
### Major components of CR watcher | ||
- **[ListenerFactory](https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/informers/monitoring.go):** | ||
the component which creates listeners on the CR | ||
- **[APIClient](https://github.com/prometheus-operator/prometheus-operator/tree/main/pkg/client):** | ||
PrometheusOperator API client | ||
|
||
## Config generator | ||
The config generator is triggered by the watcher and generates a Prometheus config as a byte array. | ||
Instead of writing it onto a disk the configuration is unmarshalled using the Prometheus config loader, | ||
which is already in use by the `prometheusreceiver` resulting in a full Prometheus config. | ||
|
||
In case of an [Agent](#Collector vs Agent deployment) deployment, which is signaled with the `limit_to_node` option, | ||
only local endpoints will be fetched, endpoints should be filtered so that only pods are scraped which are scheduled | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a note, it is not recommended to watch endpoints (or endpointslices) from each node. The apiserver has a watch index for pods by node name, meaning it is acceptable to watch pods assigned to each node from a daemonset, but does not have the same for endpoints. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, but I'm not sure how to solve this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just noting it; I don't think it is easily solvable. I think we should not recommend using a daemonset with servicemonitors to users because of this. |
||
on the current node. | ||
To achieve this, additional relabeling rules are being added to all scrape configs generated by the `ConfigGenerator` | ||
|
||
Here an example of a scrape config with the additional rules: | ||
```yaml | ||
scrape_configs: | ||
- job_name: serviceMonitor/ingress/ingress-nginx-controller/0 | ||
honor_timestamps: true | ||
scrape_interval: 30s | ||
scrape_timeout: 10s | ||
metrics_path: /metrics | ||
scheme: http | ||
follow_redirects: true | ||
relabel_configs: # shortened | ||
### the autogenerated rules have been removed | ||
# save in case node ports | ||
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name] | ||
regex: "Node;(.+)" | ||
replacement: $1 | ||
target_label: __tmp_node_name | ||
action: replace | ||
# save in case pods | ||
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_node_name] | ||
regex: "Pod;(.+)" | ||
replacement: $1 | ||
target_label: __tmp_node_name | ||
action: replace | ||
# keep only targets on the defined node | ||
- source_labels: [__tmp_node_name] | ||
regex: "our-node-name" # node name we have extracted from the environment | ||
action: keep | ||
kubernetes_sd_configs: | ||
- role: endpoints | ||
kubeconfig_file: "" | ||
follow_redirects: true | ||
namespaces: | ||
names: | ||
- ingress | ||
``` | ||
|
||
Sharding should be supported using the optional `sharding` options, which declares which to which shard the receiver | ||
belongs to. Details about sharding can be found in the [sharding](#sharding) section. | ||
After generation of the base config the shard substitution `$(SHARD)` is replaced with the current shard instance | ||
(default = `1`) | ||
|
||
This config is then used to create a `prometheusreceiver` `Config` struct. | ||
|
||
### Major components of the config generation | ||
- **[ConfigGenerator](https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/prometheus/promcfg.go#L304):** | ||
PrometheusOperator component which generates a valid Prometheus config marshalled to a byte array | ||
- **[ConfigLoader](https://github.com/prometheus/prometheus/blob/main/config/config.go#L68):** | ||
Prometheus configuration loader which unmarshalls the config to a `prometheusreceiver` usable object | ||
|
||
|
||
## Processing config change events | ||
The Prometheus config is at first compared against the currently applied one. Should there be any change, a new | ||
`prometheusreceiver` is started using the generated configuration. | ||
If the startup is successful the old instance of `prometheusreceiver` is shutdown. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure it is safe to concurrently run multiple receivers that may scrape the same targets. Could this be accomplished by holding on to a last-known-good configuration to fall back on in the event of errors during startup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is definitely possible as we need to compare the new and current config anyway to assess if a reload/restart is necessary. |
||
Should any error occur during startup the old instance will keep running | ||
|
||
## Collector vs Agent deployment | ||
The receiver should support both reference architectures of the collector. | ||
|
||
### Collector | ||
If running as collector the Prometheus config provided by PrometheusOperator can be reused without a change. | ||
Should multiple instances with the same config run in the same cluster, they will act like a | ||
high availability pair of Prometheus. Therefore, all targets will be scraped multiple times and telemetry | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HA is nice, but this means the collector can't shard work at all, and can't scale up replicas to reduce load. Did you consider supporting sharding with the hashmod action, like the prometheus operator does? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't been aware of this till now. This is definitely a useful addition when the receiver is setup as collector! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other thing to consider is the OpenTelemetry Operator's prometheus target allocation capability. It is designed to allow running multiple collector instances and distributing targets across them. It will re-allocate targets if a collector instance is added or removed. I think adding the ability to utilize the pod and service monitors there should be considered as an alternative to building this into a receiver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Aneurysm9 do you have a link to a design document for the target allocator? I couldn't find any in the OpentelemetryOperator repo or on opentelemetry.io. If the community prefers to implement this first in the target allocator, I will work on that instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's the initial outline of the capability and here's the more detailed design doc for the target allocator service. The target allocation server is set up to reload its server when the config file it uses changes. It should be feasible to add a watch for the Prometheus CRDs and use them to update the config, which will then cause the allocator to start using the generated SD configs. |
||
will have to deduplicated/compacted later on. | ||
|
||
```yaml | ||
receivers: | ||
prometheus_operator: | ||
namespaces: [] | ||
monitor_selector: | ||
match_labels: | ||
prometheus-operator-instance: a-instance | ||
``` | ||
|
||
### Agent | ||
In this case the collector is deployed as agent. The receiver can be limited to workloads on a single node | ||
using the `limit_to_node` option and adding the node name as an environment variable. | ||
|
||
```yaml | ||
env: | ||
- name: K8S_NODE_NAME | ||
valueFrom: | ||
fieldRef: | ||
fieldPath: spec.nodeName | ||
``` | ||
|
||
```yaml | ||
receivers: | ||
prometheus_operator: | ||
namespaces: [] | ||
limit_to_node: ${K8S_NODE_NAME} | ||
monitor_selector: | ||
match_labels: | ||
prometheus-operator-instance: a-instance | ||
``` | ||
|
||
## Sharding | ||
Sharding is implemented trough sharding the targets which the receiver is scraping. | ||
|
||
This is achieved through `hashmod` relabeling rules which are provided by the `ConfigGenerator`, which implements | ||
sharding based onto the `__address__` label. | ||
|
||
This necessitates additional optional config options: | ||
```yaml | ||
sharding: | ||
shard_count: 1 # number of total shards | ||
shard: 1 # current shard of the receiver. 1 <= value <= shard_count | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this require the use of a distinct configuration source per instance? Is there a way to avoid that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I expect the user to use substitutions for this. Something like this could be stored in a config map and substituted with environment variables. This would mimic how the receivers:
prometheusoperator:
sharding:
shard_count: 2
shard: ${SHARD} |
||
``` | ||
|
||
`sharding.shard_count` is supplied to the `ConfigGenerator` as part of the Prometheus configs. The resulting configuration will | ||
contain multiple instances of the `$(SHARD)` substitution which should be replaced by the value of `sharding.shard` | ||
Multiple collectors with the same `shard` will scrape the same targets and therefore will duplicate data. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Simple Prometheus Receiver | ||
secustor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `prometheus_operator` receiver is a wrapper around the [prometheus | ||
receiver](../prometheusreceiver). | ||
This receiver allows using [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) | ||
CRDs like `ServiceMonitor` and `PodMonitor` to configure metric collection. | ||
|
||
Supported pipeline types: metrics | ||
|
||
> :construction: This receiver is in **development**, and is therefore not usable at the moment. | ||
|
||
## When to use this receiver? | ||
Choose this receiver, if you use [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/example/user-guides/getting-started/example-app-service-monitor.yaml) | ||
or [PodMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/example/user-guides/getting-started/example-app-pod-monitor.yaml) | ||
custom resources in your Kubernetes cluster. These can be provided by applications (e.g. helm charts) or manually | ||
deployed by users. | ||
|
||
In every other case other Prometheus receivers should be used. | ||
Below you can find a short description of the available options. | ||
|
||
### Prometheus scrape annotations | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you even need to use the receivercreator in this case. You can just use the __meta_kubernetes_pod_annotation_prometheus_io_scrape label to filter pods (use the equivalent for endpoints) directly in the prometheusreceiver. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should work as you describe it. I agree that the |
||
If you use annotation based scrape configs like `prometheus.io/scrape = true`, then you should use the | ||
[prometheusreceiver](../prometheusreceiver/README.md). | ||
|
||
A guide how to use this meta receiver with Prometheus annotations you can find in the | ||
[examples of the receivercreator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/receivercreator#examples) | ||
|
||
### Static targets | ||
If a simple static endpoint in or outside the cluster should be scraped, use the [simpleprometheusreceiver](../simpleprometheusreceiver/README.md). | ||
It provides a simplified interface around the `prometheusreceiver`. Use cases could be the federation of Prometheus | ||
instances or scraping of targets outside dynamic setups. | ||
|
||
### Prometheus service discovery and manual configuration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just documenting some investigation i've done in the past: Why not just implement PodMonitor and ServiceMonitor using the prometheus service discovery? That would have the benefit of not needing to shutdown and restart the prometheus receiver when a PodMonitor or ServiceMonitor is modified. Answer: The prometheus service discovery interface only supports adding new targets, but doesn't support manipulating metrics after they are scraped. So we wouldn't be able to support metricRelabelConfigs with that approach. |
||
The [prometheusreceiver](../prometheusreceiver/README.md) allows to configure the collector much a like a Prometheus | ||
server instance and supports the most low-level configuration options for Prometheus metric scraping by the | ||
OpenTelemetry Collector. Prometheus supports here static configurations as well as dynamic configuration based on the | ||
service discovery concept. These service discovery options allow to use a multitude of external systems to discovery | ||
new services. One of them is the | ||
[Kubernetes API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) other options | ||
include public and private cloud provider APIs, the Docker daemon and generic service discovery sources like | ||
[HTTP_SD](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) and | ||
[FILE_SD](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config). | ||
|
||
### OpenTelemetry operator | ||
The OpenTelemetry community provides an operator which is the recommended method to operate OpenTelemetry on Kubernetes. | ||
This operator supports deploying the `prometheusoperatorreceiver` for usage in your cluster. | ||
|
||
//TODO add target allocator capabilities | ||
|
||
Alternatively targeting methods like the [target allocator](https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator) | ||
are in development, but do not support PrometheusOperator CRDs at this time. | ||
|
||
## Configuration | ||
|
||
The following settings are optional: | ||
|
||
- `auth_type` (default = `serviceAccount`): Determines how to authenticate to | ||
the K8s API server. This can be one of `none` (for no auth), `serviceAccount` | ||
(to use the standard service account token provided to the agent pod), or | ||
`kubeConfig` to use credentials from `~/.kube/config`. | ||
- `namespaces` (default = `all`): An array of `namespaces` to collect events from. | ||
This receiver will continuously watch all the `namespaces` mentioned in the array for | ||
new events. | ||
|
||
Examples: | ||
|
||
```yaml | ||
prometheus_operator: | ||
auth_type: kubeConfig | ||
namespaces: | ||
- default | ||
- my_namespace | ||
monitor_selector: | ||
match_labels: | ||
prometheus-operator-instance: "cluster" | ||
``` | ||
|
||
The full list of settings exposed for this receiver are documented [here](./config.go) | ||
with detailed sample configurations [here](./testdata/config.yaml). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
// Copyright 2020, OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package prometheusoperatorreceiver | ||
|
||
import ( | ||
"go.opentelemetry.io/collector/config" | ||
|
||
"github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8sconfig" | ||
) | ||
|
||
type MatchLabels map[string]string | ||
|
||
type LabelSelector struct { | ||
MatchLabels MatchLabels `mapstructure:"match_labels"` | ||
} | ||
|
||
// Config defines configuration for simple prometheus receiver. | ||
type Config struct { | ||
config.ReceiverSettings `mapstructure:",squash"` | ||
k8sconfig.APIConfig `mapstructure:",squash"` | ||
|
||
// List of ‘namespaces’ to collect events from. An empty indicates that all namespaces should be searched | ||
Namespaces []string `mapstructure:"namespaces"` | ||
|
||
MonitorSelector LabelSelector `mapstructure:"monitor_selector"` | ||
} | ||
|
||
func (cfg *Config) Validate() error { | ||
if err := cfg.ReceiverSettings.Validate(); err != nil { | ||
return err | ||
} | ||
if err := cfg.APIConfig.Validate(); err != nil { | ||
return err | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just allow specifying filters for PodMonitor/ServiceMonitor like you can do for prometheus? Seems like something we would want eventually, and would cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had 3 limitation options in mind:
The first two are currently provided by PrometheusOperator
ConfigGenerator
package and the later one is implemented using the additional relabel configs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just suggesting that we re-use the underlying prometheus namespaces and selector structure. It allows specifying the role (podmonitor or servicemonitor in this case), a label selector, and a field selector. The field selector would allow limiting to podmonitors or servicemonitors on the same node, but is more general than your proposed node limiter option. Because of the "role" field, it would also allow supporting only podmonitors, or only servicemonitors, and allows different label or field selectors for each. Re-using the prometheus server's structure for this config would make it familiar to those already familiar with kubernetes_sd_configs.