[receiver/prometheusoperator] Add base structure #6344

secustor · 2021-11-16T16:24:09Z

Description:

Adding structure for a new receiver based on the prometheus receiver.

Target is the support of a subset of PrometheusOperator CRDs as configuration option. This should give the user the option to gather metrics from targets defined by CRDs such as the ServiceMontitor or PodMonitor. These are provided mostly by applications themselves.
Link to tracking Issue: #6345

Testing: Only standard config parsing test ATM as this PR only contains the structure

Documentation: added README which describes the status and options of this receiver

Aneurysm9 · 2021-11-16T17:09:43Z

I like the idea of being able to reuse the CRDs from the Prometheus operator, but I'd like to see some more detail about how this receiver would function and how it would integrate with the existing Prometheus receiver. Can you prepare a design document?

mx-psi

@open-telemetry/collector-contrib-approvers (or somebody else) Is someone with Prometheus knowledge able to review this?

receiver/prometheusoperatorreceiver/README.md

mx-psi · 2021-11-16T17:12:41Z

receiver/prometheusoperatorreceiver/config.go

+
+import (
+	"go.opentelemetry.io/collector/config"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"


This is an unstable dependency; its structs should not be part of the public API of the receiver (otherwise we may not be able to use the latest k8s.io version)

secustor · 2021-11-16T21:33:00Z

@Aneurysm9 Do you have example of such a document for OpenTelemetry or what is the preferred format for this?
I couldn't find any reference in the contributing guide or during a short google search.

Aneurysm9 · 2021-11-16T23:39:46Z

I don't think we need anything fancy. A design.md similar to what exists for the prometheus receiver would be good. I'm looking mostly for how this interacts with the existing receiver, pros/cons of having a separate receiver vs. trying to integrate the capability in the existing receiver, alternatives that could have been considered such as the receivercreator receiver, etc. I'd like to have some idea what the end state will look like and some confidence that it is the correct approach to solving the problem at hand before we start creating new components.

jpkrohling · 2021-11-17T10:32:32Z

I'd like to have some idea what the end state will look like and some confidence that it is the correct approach to solving the problem at hand before we start creating new components.

Agree. We have multiple pieces of Prometheus support scattered around, including in the operator (like the target allocator feature, of which @Aneurysm9 is the maintainer). It would be great to regroup and define what the ideal solution would look like for different use cases and components.

cc @alolita, as you probably also have an interest in the target allocator (and around Prometheus in general).

Aneurysm9 · 2021-11-17T16:03:43Z

The target allocation capability in the operator is precisely why this piqued my interest. I had the configuration generation capability of the Prometheus operator exposed so that it could be incorporated there and I'd hope that the approach we take to include it directly in the collector can also be reused and integrated there.

secustor · 2021-11-17T16:16:09Z

@Aneurysm9 I added a first synopsis, of what my initial thoughts have been.
It's by far not exhaustive and it does not contain any comparisons yet, but I think it is enough to get the general direction I have been investigating.

Should there be still interest, I can further develop the document.

punya · 2021-11-17T17:12:17Z

cc @dashpole

alolita · 2021-11-17T20:38:38Z

@jpkrohling thx! Taking a look at this proposal.

jpkrohling · 2021-11-22T09:51:01Z

@secustor, could you add a "when to use what" section? For people getting started with OpenTelemetry, it might be confusing to understand all the available components (this, other Prometheus receivers, otel-operator, ...) and when to use them.

…ency

This reverts commit 70c53a42ded66edac9ce9572890905c62398f632.

secustor · 2021-11-22T16:54:58Z

@jpkrohling I have added such a section to the README of prometheusoperatorreceiver, but I don't think it is a fitting place for such user guides. In my opinion a file in a docs folder will be easier to find for new users.

dashpole

Only reviewed the design.

What is the plan for managing TLS certs? The prometheus operator adds these to prometheus as a volume, but we need a plan for managing these ourselves.

dashpole · 2021-11-23T15:32:45Z

receiver/prometheusoperatorreceiver/DESIGN.md

+Instead of writing it onto a disk the configuration is unmarshalled using the Prometheus config loader, 
+which is already in use by the `prometheusreceiver` resulting in a full Prometheus config. 
+
+In case of an [Agent](#Collector vs Agent deployment) deployment, which is signaled with the `limit_to_node` option, 


Should we just allow specifying filters for PodMonitor/ServiceMonitor like you can do for prometheus? Seems like something we would want eventually, and would cover this case.

I had 3 limitation options in mind:

namespace(s) which are to be watched for monitor objects

a label selector to limit monitor objects in these namespaces( as it is setup in the Prometheus CRD of PrometheusOperator )

the node limiter option, which is used for an agent style deployment.

The first two are currently provided by PrometheusOperator ConfigGenerator package and the later one is implemented using the additional relabel configs.

I'm just suggesting that we re-use the underlying prometheus namespaces and selector structure. It allows specifying the role (podmonitor or servicemonitor in this case), a label selector, and a field selector. The field selector would allow limiting to podmonitors or servicemonitors on the same node, but is more general than your proposed node limiter option. Because of the "role" field, it would also allow supporting only podmonitors, or only servicemonitors, and allows different label or field selectors for each. Re-using the prometheus server's structure for this config would make it familiar to those already familiar with kubernetes_sd_configs.

dashpole · 2021-11-23T15:36:39Z

receiver/prometheusoperatorreceiver/DESIGN.md

+which is already in use by the `prometheusreceiver` resulting in a full Prometheus config. 
+
+In case of an [Agent](#Collector vs Agent deployment) deployment, which is signaled with the `limit_to_node` option, 
+only local endpoints will be fetched, endpoints should be filtered so that only pods are scraped which are scheduled 


As a note, it is not recommended to watch endpoints (or endpointslices) from each node. The apiserver has a watch index for pods by node name, meaning it is acceptable to watch pods assigned to each node from a daemonset, but does not have the same for endpoints.

Makes sense, but I'm not sure how to solve this.
The only option other than introducing a new index in Kubernetes I see is the introduction of a shared cache. This could be maybe done as extension.

I'm just noting it; I don't think it is easily solvable. I think we should not recommend using a daemonset with servicemonitors to users because of this.

dashpole · 2021-11-23T15:59:03Z

receiver/prometheusoperatorreceiver/DESIGN.md

+### Collector
+If running as collector the Prometheus config provided by PrometheusOperator can be reused without a change.    
+Should multiple instances with the same config run in the same cluster, they will act like a 
+high availability pair of Prometheus. Therefore, all targets will be scraped multiple times and telemetry 


HA is nice, but this means the collector can't shard work at all, and can't scale up replicas to reduce load. Did you consider supporting sharding with the hashmod action, like the prometheus operator does?

I haven't been aware of this till now. This is definitely a useful addition when the receiver is setup as collector!
I will work this into the proposal

The other thing to consider is the OpenTelemetry Operator's prometheus target allocation capability. It is designed to allow running multiple collector instances and distributing targets across them. It will re-allocate targets if a collector instance is added or removed. I think adding the ability to utilize the pod and service monitors there should be considered as an alternative to building this into a receiver.

@Aneurysm9 do you have a link to a design document for the target allocator? I couldn't find any in the OpentelemetryOperator repo or on opentelemetry.io.

If the community prefers to implement this first in the target allocator, I will work on that instead.

Here's the initial outline of the capability and here's the more detailed design doc for the target allocator service.

The target allocation server is set up to reload its server when the config file it uses changes. It should be feasible to add a watch for the Prometheus CRDs and use them to update the config, which will then cause the allocator to start using the generated SD configs.

dashpole · 2021-11-23T16:03:45Z

receiver/prometheusoperatorreceiver/README.md

+In every other case other Prometheus receivers should be used. 
+Below you can find a short description of the available options.
+
+### Prometheus scrape annotations


I don't think you even need to use the receivercreator in this case. You can just use the __meta_kubernetes_pod_annotation_prometheus_io_scrape label to filter pods (use the equivalent for endpoints) directly in the prometheusreceiver.

Should work as you describe it. I agree that the prometheusreceiver is in that case preferable.
I will adapt here the section too.

dashpole · 2021-11-23T16:17:20Z

receiver/prometheusoperatorreceiver/README.md

+It provides a simplified interface around the `prometheusreceiver`. Use cases could be the federation of Prometheus
+instances or scraping of targets outside dynamic setups. 
+
+### Prometheus service discovery and manual configuration


Just documenting some investigation i've done in the past: Why not just implement PodMonitor and ServiceMonitor using the prometheus service discovery? That would have the benefit of not needing to shutdown and restart the prometheus receiver when a PodMonitor or ServiceMonitor is modified.

Answer: The prometheus service discovery interface only supports adding new targets, but doesn't support manipulating metrics after they are scraped. So we wouldn't be able to support metricRelabelConfigs with that approach.

github-actions · 2021-12-16T05:15:38Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2022-01-12T05:15:59Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-01-27T05:15:38Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-02-11T05:15:25Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

mx-psi · 2022-02-11T08:44:02Z

Is this something that will be worked on in the future? Or should I stop removing the Stale tag?

secustor · 2022-02-11T14:43:59Z

I think this is still proposal is still valid.

The current focus is to implement the support for this, as described in this comment, in the TargetAllocator.

Is there a way to remove this PR from the lifecycle?

mx-psi · 2022-02-11T15:45:28Z

Is there a way to remove this PR from the lifecycle?

Not that I know of, other than closing it and re-opening when it can be worked on again

github-actions · 2022-02-26T05:15:26Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-03-15T05:15:50Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-04-04T05:15:56Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-04-18T05:16:03Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

secustor requested review from a team and mx-psi November 16, 2021 16:24

github-actions bot assigned mx-psi Nov 16, 2021

mx-psi reviewed Nov 16, 2021

View reviewed changes

Aneurysm9 self-assigned this Nov 17, 2021

alolita assigned dashpole Nov 17, 2021

secustor force-pushed the add_prometheus_receiver_service_monitor branch from 0a01f95 to dbe658c Compare November 19, 2021 23:23

secustor added 11 commits November 22, 2021 17:37

feat(receiver/prometheusOperator): add base structure

c24b116

chore(receiver/prometheusOperator): add to multi module

ddc7dfd

chore(receiver/prometheusOperator): add secustor to code owners file

82dca05

chore(receiver/prometheusOperator): remove k8s.io/apimachinery depend…

7934f81

…ency

Revert "chore(receiver/prometheusOperator): add to multi module"

d32b3ed

This reverts commit 70c53a42ded66edac9ce9572890905c62398f632.

docs(receiver/prometheusOperator): add first synopsis

f7f46b8

chore(receiver/prometheusOperator): add to multimod

d1c07cc

docs(receiver/prometheusOperator): add agent deployment example

41a3a3d

docs(receiver/prometheusOperator): add decision guide to README.md

5cc483b

chore: run make gomoddownload

cfd36b7

chore(receiver/prometheusOperator): add receiver to receivers_test.go

cd6f61c

secustor force-pushed the add_prometheus_receiver_service_monitor branch from dbe658c to cd6f61c Compare November 22, 2021 16:48

dashpole reviewed Nov 23, 2021

View reviewed changes

mx-psi removed the Stale label Dec 8, 2021

github-actions bot added the Stale label Dec 16, 2021

mx-psi removed the Stale label Dec 16, 2021

jpkrohling changed the title ~~feat(receiver/prometheusOperator): add base structure~~ [receiver/prometheusoperator] Add base structure Dec 16, 2021

shelbyspees mentioned this pull request Dec 17, 2021

SRE-96: Add pipeline to send application metrics to Honeycomb equinixmetal-helm/k8s-otel-collector#12

Closed

github-actions bot added the Stale label Jan 12, 2022

mx-psi removed the Stale label Jan 12, 2022

secustor mentioned this pull request Jan 24, 2022

[targetallocator] PrometheusOperator CRD MVC open-telemetry/opentelemetry-operator#653

Merged

8 tasks

github-actions bot added the Stale label Jan 27, 2022

mx-psi removed the Stale label Jan 27, 2022

github-actions bot added the Stale label Feb 11, 2022

mx-psi removed the Stale label Feb 11, 2022

github-actions bot added the Stale label Feb 26, 2022

mx-psi removed the Stale label Feb 28, 2022

github-actions bot added the Stale label Mar 15, 2022

jpkrohling removed the Stale label Mar 15, 2022

github-actions bot added the Stale label Apr 4, 2022

github-actions bot closed this Apr 18, 2022

carlosjgp mentioned this pull request May 26, 2022

[Prometheus remote-write receiver] Create a receiver to send metrics using Prometheus remote-write API #10358

Closed

[receiver/prometheusoperator] Add base structure #6344

[receiver/prometheusoperator] Add base structure #6344

Conversation

secustor commented Nov 16, 2021 • edited Loading

Aneurysm9 commented Nov 16, 2021

mx-psi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

secustor commented Nov 16, 2021

Aneurysm9 commented Nov 16, 2021

jpkrohling commented Nov 17, 2021

Aneurysm9 commented Nov 17, 2021

secustor commented Nov 17, 2021

punya commented Nov 17, 2021

alolita commented Nov 17, 2021

jpkrohling commented Nov 22, 2021 • edited Loading

secustor commented Nov 22, 2021

dashpole left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 16, 2021

github-actions bot commented Jan 12, 2022

github-actions bot commented Jan 27, 2022

github-actions bot commented Feb 11, 2022

mx-psi commented Feb 11, 2022

secustor commented Feb 11, 2022

mx-psi commented Feb 11, 2022

github-actions bot commented Feb 26, 2022

github-actions bot commented Mar 15, 2022

github-actions bot commented Apr 4, 2022

github-actions bot commented Apr 18, 2022

secustor commented Nov 16, 2021 •

edited

Loading

jpkrohling commented Nov 22, 2021 •

edited

Loading