Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[targetallocator] PrometheusOperator CRD MVC #653

Merged
merged 26 commits into from
Apr 22, 2022

Conversation

secustor
Copy link
Member

@secustor secustor commented Dec 28, 2021

Context: #383

Currently implemented:

  • fetch CRDs ServiceMonitor and Podmonitor
  • generate PromConfigs from monitor instances
  • update on creation, deletion and change of instances

TBD:

  • clean up server shutdown
  • add flag to make PrometheusCR watcher optional
  • add option to control watched namespaces
  • add option to apply a label selector for monitors

Needs discussion:

  • splitting of application configuration and promconfig in the tracked config map

cc @Aneurysm9

@secustor secustor changed the title [targetallocator] Implement prometheus crd MVC [targetallocator] PrometheusOperator CRD MVC Dec 28, 2021
@Aneurysm9
Copy link
Member

😻

@secustor secustor marked this pull request as ready for review December 31, 2021 17:07
@secustor secustor requested review from a team and owais December 31, 2021 17:07
@secustor
Copy link
Member Author

Splitting this here of as else the PR would get to big.

@jpkrohling
Copy link
Member

@pavolloffay, @VineethReddy02, is one of you interested in reviewing this?

@secustor
Copy link
Member Author

Any chance we get this reviewed by a maintainer?

I would like to propose that we use the operator/collector, but we heavely lean on PrometheusCRDs to configure our scrape targets.

@VineethReddy02
Copy link
Contributor

I took a quick pass through the PR. Looks good to me in the first pass. Can we add e2e tests for the same?

@secustor
Copy link
Member Author

No, the Collectors would have to be redeployed when CRs would be changed.

That would be definitely the cleanest solution.

I agree, I'm going to create an issue if this PR is accepted.

@jpkrohling
Copy link
Member

@pavolloffay, @VineethReddy02 you've made a couple of comments to this PR, do you think this is ready to be merged?

Copy link
Contributor

@VineethReddy02 VineethReddy02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -120,11 +120,21 @@ type OpenTelemetryTargetAllocator struct {
// +optional
Enabled bool `json:"enabled,omitempty"`

// PrometheusCR defines the configuration for the retrieval of PrometheusOperator CRDs ( servicemonitor.monitoring.coreos.com/v1 and podmonitor.monitoring.coreos.com/v1 ) retrieval.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question as before, from which namespaces are these objects queried?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a line clearing this up.

Watches the Prometheus service discovery for new targets and sets targets to the Allocator

### Allocator
Shards the received targets based on the discovered Collector instances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the discovered

Is this correct? Strictly speaking the TA does not discover collectors. The collectors should be known from the collector CR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does watch the associated stateful set to observe scaling events and reallocate targets as necessary when the size of the set of collectors changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the TargetAllocator watches the API for pods with the expected labels.
Because of this I have described it as discovered, would you prefer watch?

### Allocator
Shards the received targets based on the discovered Collector instances

### Collector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this OTEL collector? Maybe we should find a different name if it is not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headlines are referring to the packages. The collector packages watches for new collector instances. I have rearranged it a bit to clear this up.


"github.com/go-kit/log"
"github.com/go-logr/logr"
allocatorconfig "github.com/otel-allocator/config"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this import from this repository? Then it should be moved to a 3rd group. Why doesn't it have the full path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is referencing the local module.

Why this module name has been chosen, I can not say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this import path is incorrect and should be changed. Probably best to change it in a separate PR, though.


# Design

If the Allocator is activated, all Prometheus configurations will be transferred in a separate ConfigMap which get in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all Prometheus configurations

Is it OTEL p8s receiver configuration or as well p8s CRs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to the configuration of the prometheus receiver in the collector configuration. The Prometheus operator CRs are observed separately after the target allocator has been started, at which point it further modifies the service discovery configuration it is using.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only referencing the OTEL p8s configuration.

turn mounted to the Allocator.
This configuration will be resolved to target configurations and then split across all OpenTelemetryCollector instances.

TargetAllocators exposes the results as [HTTP_SD endpoints](https://prometheus.io/docs/prometheus/latest/http_sd/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the OTEL colletor get the final config (resolved targets)?

Does the OTEL operator configure the collector to use the http_sd to get the targets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the OTEL operator rewrites the collector prometheus receiver config to use http_sd for all configured jobs, pointing at the target allocator instance for that collector.

There is still a gap, that @secustor has proposed to close with open-telemetry/opentelemetry-collector-contrib#8055. This addresses the fact that the Prometheus operator CRs can cause new jobs to be exposed at the target allocator but the collector configuration is static and does not currently have a mechanism for discovering those jobs. The target allocator exposes a job list resource that should suffice to discover new/ended jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. The operator configures the collector to use http_sd endpoints which are provided by the TargetAllocator.

Currently the Operator defines the p8s jobs which the collector should query, therefore no p8s operator CR defined jobs are added to the collector. Later on the the collector should query the jobs directly from the TargetAllocator.
See following issues:

@pavolloffay
Copy link
Member

Sorry for super long delay in reviewing this. I had to take some time to lear about target allocator.

@secustor secustor requested a review from pavolloffay February 23, 2022 08:11
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay it appears your comments have been addressed, can you confirm whether there are still outstanding items to address?

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the approach here, one non-blocking question. I see you're adding the changes to the end-to-end tests, would it be possible to add a test involving a prometheus CRD to gut check that the expected configuration gets generated?

PromCRWatcherConf PrometheusCRWatcherConfig
}

func Load(file string) (Config, error) {
var cfg Config
if err := unmarshal(&cfg, file); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: It seems previously the allocator would load a default config file if no file was provided, it seems that was because we would always pass in an empty string in main.go. Should that functionality remain in case there any consumers of this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to users of the TargetAllocator or Devs which import this package?

In the first case the behavior has not changed. It is still loading the same default file as before. This is implement using pflags now. https://github.com/open-telemetry/opentelemetry-operator/pull/653/files#diff-d292758014333553bcd82dfdf75743ef82b0b32f6414e2dc345777fc342e4480R77

@secustor
Copy link
Member Author

secustor commented Apr 6, 2022

see you're adding the changes to the end-to-end tests, would it be possible to add a test involving a prometheus CRD to gut check that the expected configuration gets generated?

I don't think it is possible right now to test API responses with KUTTL.

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this PR, however I am still not familiar with the target-allocator and it is not area where I focus.

I am fine merging this if there will be people maintaining it.

secustor added 2 commits April 9, 2022 16:59
# Conflicts:
#	cmd/otel-allocator/Dockerfile
#	cmd/otel-allocator/go.sum
@secustor
Copy link
Member Author

@pavolloffay Can you merge this? All checks are successful.

@pavolloffay
Copy link
Member

Merging based on the support discussion in https://cloud-native.slack.com/archives/C033BJ8BASU/p1650372898413199.

@secustor @Aneurysm9 (AWS) are willing to support the target-allocator.

@pavolloffay pavolloffay merged commit a84a44c into open-telemetry:main Apr 22, 2022
@secustor secustor deleted the implement_prometheus_crd branch April 22, 2022 07:46
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
* feat(target-allocator): allow custom config file path

* feat(target-allocator): move CLI config options to config package

* feat(target-allocator): allow running outside of cluster for debugging

* introduce meta watcher

* add event source

* fix: log panics

* fix: race condition

* fixup! fix: log panics

* feat: implement promCR retrieval

* feat: functioning

* refactor: some cleanup

* feat(target-allocator): escape job names in query parameters

* feat(target-allocator): make prometheusCR lookup optional and allow using kubernetes_sd outside of cluster

* refactor(target-allocator): improve memory usage and comments

* chore(target-allocator): update PromOperator and Kubernetes deps

* refactor(target-allocator): use exclude instead of replace directive

* ci: add Makefile targets for target allocator

* tests: add kuttl tests for PrometheusCR feature of target allocator

* docs(targetAllocator): add README.md

* fixup CRD docs

* fix(Makefile): add missing PHONY tags

* implement change requests

* go mod tidy and fix linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants