-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for pod/service monitors for TargetAllocator and skip invalid ones #2328
Add validation for pod/service monitors for TargetAllocator and skip invalid ones #2328
Conversation
// based on the service monitor and pod metrics endpoints specs. | ||
// This code borrows from | ||
// https://github.com/prometheus-operator/prometheus-operator/blob/ba536405154d18f3a6f312818283d671182af6f3/pkg/prometheus/resource_selector.go#L237 | ||
func validateRelabelConfig(rc monitoringv1.RelabelConfig) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful for us to reach out to the prometheus operator folks to see if they would expose this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaronoff97. I have reached out to them, waiting for a response,(prometheus-operator/prometheus-operator#6057 (comment)) based on that I can add the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also created an issue here - prometheus-operator/prometheus-operator#6076
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaronoff97 - There is a question about breaking changes. I have answered it from what I understand and what I think the way forward would be. Could you please respond to provide your perspective and correct me if i am wrong?
prometheus-operator/prometheus-operator#6076 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, i responded there :) thanks for making the issue. Let's wait to hear from them before merging this – i would love to just use their method rather than having to write it ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some tests in to the promOperator_test.go file to ensure this works how we expect?
@rashmichandrashekar could you re-implement this PR to use the interfaces recommended by the prometheus operator? |
@jaronoff97 - For sure, I have the changes all ready. Just cleaning some things up and I will create a PR today. Thanks for helping with this! |
@jaronoff97 - I have reimplemented this and added relevant tests. Please take a look and lmk if you have any feedback. Also a couple tests seem to be timing out, please requeue them if possible. Thanks! |
Thank you! I'll try to take look today or monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits/ questions. Does this change how our existing label selector works for the target allocator? I'm wondering if we should also write a new e2e or two to verify this works as expected. cc @swiatekm-sumo
cmd/otel-allocator/config/config.go
Outdated
ServiceMonitorNamespaceSelector map[string]string `yaml:"service_monitor_namespace_selector,omitempty"` | ||
PodMonitorNamespaceSelector map[string]string `yaml:"pod_monitor_namespace_selector,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be defaulting these to empty so that we can keep the current behavior (link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaronoff97. Since the common prometheus fields have the ServiceMonitorNamespaceSelector and PodMonitorNamespaceSelector already initialized, even if the targetallocator config is not setting these selectors (or have nil selectors), the default behavior will be to match all namespaces which is the current behavior. But I can add defaults, in case you think otherwise. Please let me know.
The following tests also depict the behavior with no selectors in the config vs with selectors.
https://github.com/open-telemetry/opentelemetry-operator/blob/883b583c4c93038d6fddc55b192659c4aa90f0fe/cmd/otel-allocator/watcher/promOperator_test.go#L91C4-L91C34
https://github.com/open-telemetry/opentelemetry-operator/blob/883b583c4c93038d6fddc55b192659c4aa90f0fe/cmd/otel-allocator/watcher/promOperator_test.go#L505C3-L509C6
"ServiceMonitorNamespaceSelector": w.serviceMonitorNamespaceSelector, | ||
} { | ||
|
||
sync, err := k8sutil.LabelSelectionHasChanged(old.Labels, cur.Labels, selector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm my understanding here: this event handler waits for any namespace changes in which case we check if one of our namespace selectors has changed as well in which case we send an event to the notifyEvents channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This event handler watches for any namespace changes and then calls this method - LabelSelectionHasChanged to basically make sure that the TA's selector yields the same results with old and new labels, if it does then an event is not sent and the new labels lead to a change in behavior then an event is sent.
For ex- If the cfg.podMonitorNamespaceSelector was set to match namespaces with labels "testns" : "testns" and the initial set of namespace labels on for example - namespace1 were "testns" : "testns" and later if the labels were removed, then an event would be sent. But if later, the labels on the namespace were updated to
{
"testns" : "testns",
"testns1": "testns1",
}
it wouldnt send an event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for the great explanation! 🙇
Sure @jaronoff97. This doesnt change the existing behavior of the selectors. This just adds a couple other selectors just like in prometheus-operator. I did add unit tests for all these selectors to make sure that the selectors result in the right configuration and select the right pod and svc monitors as well as selectors on namespaces result in picking up pod and svc monitors in the matching namespaces. please lmk if e2e are also necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this change, I really like the watcher cleanup. I do have a concern about this being a breaking change in its current state though - see my comment on the namespace informer.
@rashmichandrashekar we just talked about this at the operator SIG and want to attempt to get this in prior to our next release. In order to do so, I opened up #2426 to see if we can ensure that we are not going to break users by checking they have the new permission for the TA to function. We're going to pause merging this until we have an answer there. |
[like] Rashmi Chandrashekar reacted to your message:
…________________________________
From: Jacob Aronoff ***@***.***>
Sent: Thursday, December 7, 2023 7:52:29 PM
To: open-telemetry/opentelemetry-operator ***@***.***>
Cc: Rashmi Chandrashekar ***@***.***>; Mention ***@***.***>
Subject: Re: [open-telemetry/opentelemetry-operator] Add validation for pod/service monitors for TargetAllocator and skip invalid ones (PR #2328)
nope, this is a different SIG :) you can find in slack at #otel-operator and you can see the other sigs here<https://github.com/open-telemetry/community/?tab=readme-ov-file#implementation-sigs>.
—
Reply to this email directly, view it on GitHub<#2328 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIZHOKKZ4RNAJSHL4GXAEADYIIM73AVCNFSM6AAAAAA7D2OTK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWGAYDMNJQHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@rashmichandrashekar i've merged in my PR so now you should be able to fix the conflicts and update the necessary TA permissions :) |
Thanks so much @jaronoff97 for unblocking this PR. I will get the latest and test it out and update this PR. |
@jaronoff97 - I have updated the PR by fixing conflicts and adding permissions check for namespaces in the webhook. Please take a look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me overall, modulo some relatively minor problems.
.chloggen/target-allocator-use-recommended-prometheus-operator-interfaces.yaml
Outdated
Show resolved
Hide resolved
.chloggen/target-allocator-use-recommended-prometheus-operator-interfaces.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd like to use LabelSelectors directly in configuration instead of repeating the mistake we made with existing ones. @jaronoff97 can you also have a final look?
1b06342
to
da40b95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks a lot for doing this, and your patience in accepting our feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely fantastic work here. Thank you so much for your patience and contribution. 🎉 the added tests are also great to see.
…invalid ones (open-telemetry#2328) * fix for adding validation * removing unused references * adding tests * adding some changes * cleaning up * adding change log * running goimports and adding return value check code for namespace informer * fixing lint error * fixing tests and comment * adding permissions for e2e tests * adding cluster roles instead of roles * updaintg readme * fixing comments * adding contant to same block * fixing lint errors * running go import * adding namespaces since that is required for informer * adding extected warnings * addressing comments * adding test for namespace label update * fixing goimports * making namespaceselectores as labelselectors
Description:
Fixing a bug - When a podmonitor/svcmonitor CRD (with Targetallocator enabled)has an erroneous field for regex in the relabeling/metric relabeling fields, the crd creation succeeds since it can be a valid string but not a valid regex( ex- .*( ). This causes any subsequent valid CRDs being created from getting picked up.
Link to tracking Issue: #2309
Testing: Tested this out with otel operator setup