-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus ServiceMonitor failing to scrape operator metrics served though kube-proxy HTTPS 8443 port #4764
Comments
Hello, I had the exact same problem and it brought me a lot of headache. Try the following:
|
Thanks for posting that solution @criscola, actually your suggestion makes total sense. I have applied that change and deploy it: $ git diff
diff --git a/config/prometheus/monitor.yaml b/config/prometheus/monitor.yaml
index 1b44d4f..a5bd8b1 100644
--- a/config/prometheus/monitor.yaml
+++ b/config/prometheus/monitor.yaml
@@ -11,6 +11,10 @@ spec:
endpoints:
- path: /metrics
port: https
+ scheme: https
+ bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
+ tlsConfig:
+ insecureSkipVerify: true
selector:
matchLabels:
control-plane: controller-manager
$ make deploy
cd config/manager && /home/slopez/bin/kustomize edit set image controller=quay.io/3scale/prometheus-exporter-operator:v0.3.0
/home/slopez/bin/kustomize build config/manual | kubectl apply -f -
namespace/prometheus-exporter-operator-system created
customresourcedefinition.apiextensions.k8s.io/prometheusexporters.monitoring.3scale.net created
serviceaccount/prometheus-exporter-operator-controller-manager created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-role created
role.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-exporter-operator-proxy-rolebinding created
service/prometheus-exporter-operator-controller-manager-metrics-service created
deployment.apps/prometheus-exporter-operator-controller-manager created
servicemonitor.monitoring.coreos.com/prometheus-exporter-operator-controller-manager-metrics-monitor created But now prometheus cannot scrape the operator, before I had prometheus It might be caused by a total unrelated problem regarding the monitoring stack I'm using, which is the openshift user-workload-monitoring stack (let's say, the official way of monitoring user workloads on openshift). If I get into a prometheus pod from openshift user-workload-monitoring stack (for example container config-reloaded $ oc project openshift-user-workload-monitoring
Now using project "openshift-user-workload-monitoring" on server "https://api.....net:6443".
$ oc get pods
NAME READY STATUS RESTARTS AGE
prometheus-operator-849fdfdcb5-ktqjd 2/2 Running 0 29d
prometheus-user-workload-0 4/4 Running 1 29d
prometheus-user-workload-1 4/4 Running 1 2d22h
thanos-ruler-user-workload-0 3/3 Running 3 29d
thanos-ruler-user-workload-1 3/3 Running 3 29d
$ kubectl exec -it prometheus-user-workload-1 -c config-reloader -- /bin/bash
bash-4.4$ cat /var/run/secrets/kubernetes.io/serviceaccount/token
qy......xAba
bash-4.4$ curl --insecure https://prometheus-exporter-operator-controller-manager-metrics-service.prometheus-exporter-operator-system.svc.cluster.local:8443/metrics -H "Authorization: Bearer qy......xAba"
# HELP ansible_operator_build_info Build information for the ansible-operator binary
# TYPE ansible_operator_build_info gauge
ansible_operator_build_info{commit="98f30d59ade2d911a7a8c76f0169a7de0dec37a0",version="v1.4.0+git"} 1
# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
controller_runtime_active_workers{controller="prometheusexporter-controller"} 0
.... However it seems that the prometheus is ignoring the ServiceMonitor because of having the |
I have checked that new + scheme: https
+ bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
+ tlsConfig:
+ insecureSkipVerify: true And I have found that my particular problem is caused because using Openshift User Workload Monitoring stack, Iif I get into on prometheus-operator pod logs, I can see the following
So the ServiceMonitor with the After commenting it with Openshift monitoring team, it is skiped because of arbitraryFSAccessThroughSMs, which it is set to They suggested me to maybe use
In addition, Openshift monitoring team told me that we should bear in mind that using bearer tokens for metrics authn puts additional load on the API server and they are looking at replacing this by client TLS auth in the future (it's being discussed here: openshift/enhancements#701) For the moment I will just remove the proxy in front the operator (to be able to access operator metrics without any problem using the Openshift UWM). So from my point of view issue can be closed now (there is no problem with operator-sdk), but I will let operator-sdk team to decide what to do, because current ServiceMonitor definition won't work on Openshift User Workload Monitoring (the official monitoring stack from Openshift), and maybe I'm missing something, can you think in a way to authenticate to the metrics endpoint that not requires a cluster role or accessing a generated secret? |
just to share for who will be able to check out and help here. The PR change the related scaffolds so might be valid to check |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…etrics see more detail in issue operator-framework/operator-sdk#4764 Signed-off-by: Abdul Hameed <ahameed@redhat.com>
…etrics see more detail in issue operator-framework/operator-sdk#4764 Signed-off-by: Abdul Hameed <ahameed@redhat.com>
Using a ServiceMonitor with the bearerTokenFile parameter set causes the ServiceMonitor to be rejected by the OpenShift user monitoring stack ( operator-framework/operator-sdk#4764 ). As there is nothing sensitive in the mondoo-operator metrics, just expose them directly to allow metrics to work under the built-in OpenShift user metrics monitoring stack. Add the ability to set some labels on the ServiceMonitor to allow a functional metrics collection with an out-of-the-box prometheus deployed as configured in https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack . Change the kustomize generation so that the kube-rbac-proxy sidecar container is no longer defined. It really only exists to protect metrics. Introduce new Service to expose the new metrics ports. Patch the default Deployment to expose the metrics port. A side benefit of this is that you don't need to specify the container name when displaying logs for mondoo-operator as there is now only a single container. Signed-off-by: Joel Diaz <joel@mondoo.com>
* Expose metrics for prometheus * Added Status Signed-off-by: Harsha <harshaisgud@gmail.com> * migrate to using new MondooOperatorConfig for metrics Rather than put the metrics config into the MondooAuditConfig (which is really for configuring monitoring-specific settings), create a new MondooOperatorConfig CRD which is cluster-scoped which can be used to configure operator-wide behavor of the mondoo-operator. In a cluster with multiple MondooAuditConfigs, it makes no sense to have one resource with metrics.enabled = true and a different one with metrics.enabled = false. So just allow a single MondooOperatorConfig to hold the cluster-wide metrics configuration for the mondoo-operator. Take the existing ServiceMonitor handling code and call it from the new mondoooperatorconfig controller. Extend the MondooOperatorConfig status to hold a list of conditions, and use this to communicate status for when metrics is enabled, but we couldn't find Prometheus installed on the cluster. The conditions handing is written so that a Condition only appears initially if the Condition.Status is set to True. This means that if you enable metrics, and Prometheus is found, there will be no Condition[].Type = PrometheusMissing with .Status = False. Only when Prometheus is missing will the condition be populated, and of course if Prometheus transitions from Missing to Found, then the Condition will be updated to show .Type = PrometheusMissing .Status = False. Signed-off-by: Joel Diaz <joel@mondoo.com> * move to http metrics Using a ServiceMonitor with the bearerTokenFile parameter set causes the ServiceMonitor to be rejected by the OpenShift user monitoring stack ( operator-framework/operator-sdk#4764 ). As there is nothing sensitive in the mondoo-operator metrics, just expose them directly to allow metrics to work under the built-in OpenShift user metrics monitoring stack. Add the ability to set some labels on the ServiceMonitor to allow a functional metrics collection with an out-of-the-box prometheus deployed as configured in https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack . Change the kustomize generation so that the kube-rbac-proxy sidecar container is no longer defined. It really only exists to protect metrics. Introduce new Service to expose the new metrics ports. Patch the default Deployment to expose the metrics port. A side benefit of this is that you don't need to specify the container name when displaying logs for mondoo-operator as there is now only a single container. Signed-off-by: Joel Diaz <joel@mondoo.com> Co-authored-by: Joel Diaz <joel@mondoo.com>
…etrics see more detail in issue operator-framework/operator-sdk#4764 Signed-off-by: Abdul Hameed <ahameed@redhat.com>
…etrics see more detail in issue operator-framework/operator-sdk#4764 Signed-off-by: Abdul Hameed <ahameed@redhat.com>
Bug Report
I'm using operator-sdk 1.5.0 and I'm trying to gather operator metrics without success.
What did you do?
Deployed default operator-sdk v1.5.0 with prometheus metrics enabled at kustomize config level (
config/default/kustomization.yaml
). I'm usingkube-rbac-proxy:v0.5.0
because of issue #4684, but I don't think it affects.What did you expect to see?
ServiceMonitor achieves to scrape operator metrics (an so, metric
up=1
).What did you see instead? Under which circumstances?
Service monitor failing (metric
up=0
):up{container="kube-rbac-proxy",endpoint="https",instance="10.129.2.246:8443",job="prometheus-exporter-operator-controller-manager-metrics-service",namespace="prometheus-exporter",pod="prometheus-exporter-operator-controller-manager-669f6fbdcc2jbm7",prometheus="openshift-user-workload-monitoring/user-workload",service="prometheus-exporter-operator-controller-manager-metrics-service"} | 0
Environment
Operator type:
/language ansible
Kubernetes cluster type: Openshift v4.6
$ operator-sdk version
$ kubectl version
Possible Solution
N/A
Additional context
If I connect to the controller-manager, manager container, I can check the metrics served through manager protected port 8080 (only available at 127.0.0.1):
However, if I try to access to the port published through the kube-proxy port, it fails (both http/https schema), which I guess is what prometheus is trying to do with the deployed ServiceMonitor, so failing):
bash-4.4$ curl 127.0.0.1:8443/metrics Client sent an HTTP request to an HTTPS server. bash-4.4$ curl https://127.0.0.1:8443/metrics curl: (60) SSL certificate problem: self signed certificate in certificate chain More details here: https://curl.haxx.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above.
Maybe there are 2 problems?
The text was updated successfully, but these errors were encountered: