Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheusoperator] Add base structure #6344

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ receiver/mysqlreceiver/ @open-telemetry/collector-c
receiver/postgresqlreceiver/ @open-telemetry/collector-contrib-approvers @djaglowski
receiver/prometheusexecreceiver/ @open-telemetry/collector-contrib-approvers @keitwb
receiver/prometheusreceiver/ @open-telemetry/collector-contrib-approvers @Aneurysm9 @dashpole
receiver/prometheusoperatorreceiver/ @open-telemetry/collector-contrib-approvers @secustor
receiver/receivercreator/ @open-telemetry/collector-contrib-approvers @jrcamp
receiver/redisreceiver/ @open-telemetry/collector-contrib-approvers @pmcollins @jrcamp
receiver/sapmreceiver/ @open-telemetry/collector-contrib-approvers @owais
Expand Down
4 changes: 3 additions & 1 deletion cmd/configschema/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,7 @@ require (
k8s.io/apimachinery v0.22.3 // indirect
k8s.io/client-go v0.22.3 // indirect
k8s.io/klog v1.0.0 // indirect
k8s.io/klog/v2 v2.9.0 // indirect
k8s.io/klog/v2 v2.10.0 // indirect
k8s.io/kube-openapi v0.0.0-20210421082810-95288971da7e // indirect
k8s.io/kubelet v0.22.3 // indirect
k8s.io/utils v0.0.0-20210819203725-bdf08cb9a70a // indirect
Expand Down Expand Up @@ -595,6 +595,8 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/splun

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/simpleprometheusreceiver => ../../receiver/simpleprometheusreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusoperatorreceiver => ./receiver/prometheusoperatorreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/opencensusreceiver => ../../receiver/opencensusreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusexecreceiver => ../../receiver/prometheusexecreceiver
Expand Down
2 changes: 2 additions & 0 deletions cmd/configschema/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/opencensusreceiver v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/podmanreceiver v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusexecreceiver v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusoperatorreceiver v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator v0.39.0
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/redisreceiver v0.39.0
Expand Down Expand Up @@ -421,7 +422,7 @@ require (
k8s.io/apimachinery v0.22.3 // indirect
k8s.io/client-go v0.22.3 // indirect
k8s.io/klog v1.0.0 // indirect
k8s.io/klog/v2 v2.9.0 // indirect
k8s.io/klog/v2 v2.10.0 // indirect
k8s.io/kube-openapi v0.0.0-20210421082810-95288971da7e // indirect
k8s.io/kubelet v0.22.3 // indirect
k8s.io/utils v0.0.0-20210819203725-bdf08cb9a70a // indirect
Expand Down Expand Up @@ -611,6 +612,8 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prome

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver => ./receiver/prometheusreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusoperatorreceiver => ./receiver/prometheusoperatorreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/podmanreceiver => ./receiver/podmanreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/wavefrontreceiver => ./receiver/wavefrontreceiver
Expand Down
4 changes: 4 additions & 0 deletions go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

164 changes: 164 additions & 0 deletions receiver/prometheusoperatorreceiver/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
## Design Goals

### Vision
Provide a nearly zero-config migration path from PrometheusOperator defined scraping configs.

### Scope
Only scrape config relevant CRDs (CustomResourceDefinition) are in scope of this receiver.
As of this writing the relevant CRDs are:
- ServiceMonitor
- PodMonitor

The receiver should be able to query this CRDs from Kubernetes, attach watchers and reconfigure its scraping config
based on existing CRs (CustomResource) in Kubernetes.

Changes as well as creations or deletions of said CRs should trigger reconciliation of the scrape config.
Active querying of the API should only occur during startup and in a defined interval of the receiver.

Namespace limitations should be possible for shared cluster scenarios.
This should be implemented using industry standard methods.


### Out of scope
Excluded are other relevant CRDs such as:
- PrometheusRules
- ThanosRuler
- Probes
- Alertmanagers
- AlertmanagerConfigs

This includes concepts like alerting and black box probing of defined targets.


## PrometheusOperator CR watcher
The CR watcher is the component responsible for querying CRs from Kubernetes API and triggering a reconciliation.

### Major components of CR watcher
- **[ListenerFactory](https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/informers/monitoring.go):**
the component which creates listeners on the CR
- **[APIClient](https://github.com/prometheus-operator/prometheus-operator/tree/main/pkg/client):**
PrometheusOperator API client

## Config generator
The config generator is triggered by the watcher and generates a Prometheus config as a byte array.
Instead of writing it onto a disk the configuration is unmarshalled using the Prometheus config loader,
which is already in use by the `prometheusreceiver` resulting in a full Prometheus config.

In case of an [Agent](#Collector vs Agent deployment) deployment, which is signaled with the `limit_to_node` option,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just allow specifying filters for PodMonitor/ServiceMonitor like you can do for prometheus? Seems like something we would want eventually, and would cover this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had 3 limitation options in mind:

  • namespace(s) which are to be watched for monitor objects
  • a label selector to limit monitor objects in these namespaces( as it is setup in the Prometheus CRD of PrometheusOperator )
  • the node limiter option, which is used for an agent style deployment.

The first two are currently provided by PrometheusOperator ConfigGenerator package and the later one is implemented using the additional relabel configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just suggesting that we re-use the underlying prometheus namespaces and selector structure. It allows specifying the role (podmonitor or servicemonitor in this case), a label selector, and a field selector. The field selector would allow limiting to podmonitors or servicemonitors on the same node, but is more general than your proposed node limiter option. Because of the "role" field, it would also allow supporting only podmonitors, or only servicemonitors, and allows different label or field selectors for each. Re-using the prometheus server's structure for this config would make it familiar to those already familiar with kubernetes_sd_configs.

only local endpoints will be fetched, endpoints should be filtered so that only pods are scraped which are scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, it is not recommended to watch endpoints (or endpointslices) from each node. The apiserver has a watch index for pods by node name, meaning it is acceptable to watch pods assigned to each node from a daemonset, but does not have the same for endpoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I'm not sure how to solve this.
The only option other than introducing a new index in Kubernetes I see is the introduction of a shared cache. This could be maybe done as extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just noting it; I don't think it is easily solvable. I think we should not recommend using a daemonset with servicemonitors to users because of this.

on the current node.
To achieve this, additional relabeling rules are being added to all scrape configs generated by the `ConfigGenerator`

Here an example of a scrape config with the additional rules:
```yaml
scrape_configs:
- job_name: serviceMonitor/ingress/ingress-nginx-controller/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
relabel_configs: # shortened
### the autogenerated rules have been removed
# save in case node ports
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
regex: "Node;(.+)"
replacement: $1
target_label: __tmp_node_name
action: replace
# save in case pods
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_node_name]
regex: "Pod;(.+)"
replacement: $1
target_label: __tmp_node_name
action: replace
# keep only targets on the defined node
- source_labels: [__tmp_node_name]
regex: "our-node-name" # node name we have extracted from the environment
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
namespaces:
names:
- ingress
```

Sharding should be supported using the optional `sharding` options, which declares which to which shard the receiver
belongs to. Details about sharding can be found in the [sharding](#sharding) section.
After generation of the base config the shard substitution `$(SHARD)` is replaced with the current shard instance
(default = `1`)

This config is then used to create a `prometheusreceiver` `Config` struct.

### Major components of the config generation
- **[ConfigGenerator](https://github.com/prometheus-operator/prometheus-operator/blob/main/pkg/prometheus/promcfg.go#L304):**
PrometheusOperator component which generates a valid Prometheus config marshalled to a byte array
- **[ConfigLoader](https://github.com/prometheus/prometheus/blob/main/config/config.go#L68):**
Prometheus configuration loader which unmarshalls the config to a `prometheusreceiver` usable object


## Processing config change events
The Prometheus config is at first compared against the currently applied one. Should there be any change, a new
`prometheusreceiver` is started using the generated configuration.
If the startup is successful the old instance of `prometheusreceiver` is shutdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is safe to concurrently run multiple receivers that may scrape the same targets. Could this be accomplished by holding on to a last-known-good configuration to fall back on in the event of errors during startup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely possible as we need to compare the new and current config anyway to assess if a reload/restart is necessary.

Should any error occur during startup the old instance will keep running

## Collector vs Agent deployment
The receiver should support both reference architectures of the collector.

### Collector
If running as collector the Prometheus config provided by PrometheusOperator can be reused without a change.
Should multiple instances with the same config run in the same cluster, they will act like a
high availability pair of Prometheus. Therefore, all targets will be scraped multiple times and telemetry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HA is nice, but this means the collector can't shard work at all, and can't scale up replicas to reduce load. Did you consider supporting sharding with the hashmod action, like the prometheus operator does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been aware of this till now. This is definitely a useful addition when the receiver is setup as collector!
I will work this into the proposal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing to consider is the OpenTelemetry Operator's prometheus target allocation capability. It is designed to allow running multiple collector instances and distributing targets across them. It will re-allocate targets if a collector instance is added or removed. I think adding the ability to utilize the pod and service monitors there should be considered as an alternative to building this into a receiver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aneurysm9 do you have a link to a design document for the target allocator? I couldn't find any in the OpentelemetryOperator repo or on opentelemetry.io.

If the community prefers to implement this first in the target allocator, I will work on that instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the initial outline of the capability and here's the more detailed design doc for the target allocator service.

The target allocation server is set up to reload its server when the config file it uses changes. It should be feasible to add a watch for the Prometheus CRDs and use them to update the config, which will then cause the allocator to start using the generated SD configs.

will have to deduplicated/compacted later on.

```yaml
receivers:
prometheus_operator:
namespaces: []
monitor_selector:
match_labels:
prometheus-operator-instance: a-instance
```

### Agent
In this case the collector is deployed as agent. The receiver can be limited to workloads on a single node
using the `limit_to_node` option and adding the node name as an environment variable.

```yaml
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
```

```yaml
receivers:
prometheus_operator:
namespaces: []
limit_to_node: ${K8S_NODE_NAME}
monitor_selector:
match_labels:
prometheus-operator-instance: a-instance
```

## Sharding
Sharding is implemented trough sharding the targets which the receiver is scraping.

This is achieved through `hashmod` relabeling rules which are provided by the `ConfigGenerator`, which implements
sharding based onto the `__address__` label.

This necessitates additional optional config options:
```yaml
sharding:
shard_count: 1 # number of total shards
shard: 1 # current shard of the receiver. 1 <= value <= shard_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require the use of a distinct configuration source per instance? Is there a way to avoid that?

Copy link
Member Author

@secustor secustor Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the user to use substitutions for this. Something like this could be stored in a config map and substituted with environment variables. This would mimic how the limit_to_node option or the k8observer is working.

receivers:
  prometheusoperator:
       sharding:
          shard_count: 2
          shard: ${SHARD}

```

`sharding.shard_count` is supplied to the `ConfigGenerator` as part of the Prometheus configs. The resulting configuration will
contain multiple instances of the `$(SHARD)` substitution which should be replaced by the value of `sharding.shard`
Multiple collectors with the same `shard` will scrape the same targets and therefore will duplicate data.
1 change: 1 addition & 0 deletions receiver/prometheusoperatorreceiver/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
79 changes: 79 additions & 0 deletions receiver/prometheusoperatorreceiver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Simple Prometheus Receiver
secustor marked this conversation as resolved.
Show resolved Hide resolved

The `prometheus_operator` receiver is a wrapper around the [prometheus
receiver](../prometheusreceiver).
This receiver allows using [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
CRDs like `ServiceMonitor` and `PodMonitor` to configure metric collection.

Supported pipeline types: metrics

> :construction: This receiver is in **development**, and is therefore not usable at the moment.

## When to use this receiver?
Choose this receiver, if you use [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/example/user-guides/getting-started/example-app-service-monitor.yaml)
or [PodMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/example/user-guides/getting-started/example-app-pod-monitor.yaml)
custom resources in your Kubernetes cluster. These can be provided by applications (e.g. helm charts) or manually
deployed by users.

In every other case other Prometheus receivers should be used.
Below you can find a short description of the available options.

### Prometheus scrape annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you even need to use the receivercreator in this case. You can just use the __meta_kubernetes_pod_annotation_prometheus_io_scrape label to filter pods (use the equivalent for endpoints) directly in the prometheusreceiver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work as you describe it. I agree that the prometheusreceiver is in that case preferable.
I will adapt here the section too.

If you use annotation based scrape configs like `prometheus.io/scrape = true`, then you should use the
[prometheusreceiver](../prometheusreceiver/README.md).

A guide how to use this meta receiver with Prometheus annotations you can find in the
[examples of the receivercreator](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/receivercreator#examples)

### Static targets
If a simple static endpoint in or outside the cluster should be scraped, use the [simpleprometheusreceiver](../simpleprometheusreceiver/README.md).
It provides a simplified interface around the `prometheusreceiver`. Use cases could be the federation of Prometheus
instances or scraping of targets outside dynamic setups.

### Prometheus service discovery and manual configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just documenting some investigation i've done in the past: Why not just implement PodMonitor and ServiceMonitor using the prometheus service discovery? That would have the benefit of not needing to shutdown and restart the prometheus receiver when a PodMonitor or ServiceMonitor is modified.

Answer: The prometheus service discovery interface only supports adding new targets, but doesn't support manipulating metrics after they are scraped. So we wouldn't be able to support metricRelabelConfigs with that approach.

The [prometheusreceiver](../prometheusreceiver/README.md) allows to configure the collector much a like a Prometheus
server instance and supports the most low-level configuration options for Prometheus metric scraping by the
OpenTelemetry Collector. Prometheus supports here static configurations as well as dynamic configuration based on the
service discovery concept. These service discovery options allow to use a multitude of external systems to discovery
new services. One of them is the
[Kubernetes API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) other options
include public and private cloud provider APIs, the Docker daemon and generic service discovery sources like
[HTTP_SD](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config) and
[FILE_SD](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config).

### OpenTelemetry operator
The OpenTelemetry community provides an operator which is the recommended method to operate OpenTelemetry on Kubernetes.
This operator supports deploying the `prometheusoperatorreceiver` for usage in your cluster.

//TODO add target allocator capabilities

Alternatively targeting methods like the [target allocator](https://github.com/open-telemetry/opentelemetry-operator/tree/main/cmd/otel-allocator)
are in development, but do not support PrometheusOperator CRDs at this time.

## Configuration

The following settings are optional:

- `auth_type` (default = `serviceAccount`): Determines how to authenticate to
the K8s API server. This can be one of `none` (for no auth), `serviceAccount`
(to use the standard service account token provided to the agent pod), or
`kubeConfig` to use credentials from `~/.kube/config`.
- `namespaces` (default = `all`): An array of `namespaces` to collect events from.
This receiver will continuously watch all the `namespaces` mentioned in the array for
new events.

Examples:

```yaml
prometheus_operator:
auth_type: kubeConfig
namespaces:
- default
- my_namespace
monitor_selector:
match_labels:
prometheus-operator-instance: "cluster"
```

The full list of settings exposed for this receiver are documented [here](./config.go)
with detailed sample configurations [here](./testdata/config.yaml).
48 changes: 48 additions & 0 deletions receiver/prometheusoperatorreceiver/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Copyright 2020, OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package prometheusoperatorreceiver

import (
"go.opentelemetry.io/collector/config"

"github.com/open-telemetry/opentelemetry-collector-contrib/internal/k8sconfig"
)

type MatchLabels map[string]string

type LabelSelector struct {
MatchLabels MatchLabels `mapstructure:"match_labels"`
}

// Config defines configuration for simple prometheus receiver.
type Config struct {
config.ReceiverSettings `mapstructure:",squash"`
k8sconfig.APIConfig `mapstructure:",squash"`

// List of ‘namespaces’ to collect events from. An empty indicates that all namespaces should be searched
Namespaces []string `mapstructure:"namespaces"`

MonitorSelector LabelSelector `mapstructure:"monitor_selector"`
}

func (cfg *Config) Validate() error {
if err := cfg.ReceiverSettings.Validate(); err != nil {
return err
}
if err := cfg.APIConfig.Validate(); err != nil {
return err
}
return nil
}
Loading