Prometheus Scraping with Secure Port #1477

blakeromano · 2024-08-02T23:44:25Z

What version of descheduler are you using?

descheduler version: 0.29.0

Does this issue reproduce with the latest release?

Yes

Which descheduler CLI options are you using?

--policy-config-file=/policy-dir/policy.yaml
--descheduling-interval=5m
--v=3

Please provide a copy of your descheduler policy config file
N/A

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.4-eks-036c24b", GitCommit:"9c0e57823b31865d0ee095997d9e7e721ffdc77f", GitTreeState:"clean", BuildDate:"2024-04-30T23:53:58Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I am trying to scrape descheduler with an OpenTelemetry collector running as a Daemonset however because there is no option to run a port insecure there is no way that I can tell to scrape the metrics off the pod.

helm install with the following values:

kind: Deployment

deschedulingInterval: 5m

run a curl to pod like

curl POD_IP:10258/metrics

will fail and OpenTelemetry Collector's prometheus scraper can't connect.

What did you expect to see?

I can scrape prometheus metrics. I'd love to just have an insecure port that can be used.

What did you see instead?

The underlying problem seems to be Descheduler decided to use the same http server as API Server which also leads to extraneous Prometheus metrics like the ones below being introduced which adds noise and confusion.

# HELP aggregator_discovery_aggregation_count_total [ALPHA] Counter of number of times discovery was aggregated
# TYPE aggregator_discovery_aggregation_count_total counter
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram

The text was updated successfully, but these errors were encountered:

blakeromano · 2024-08-02T23:55:47Z

Related issues #1102 #1095 #842

Athishpranav2003 · 2024-08-06T17:25:35Z

@blakeromano when i took a look at the code i saw that it just register's the new metrics in the registry which i presume is same as what u mentioned(the one which API server exposes). I guess for dependent services its correct to have central store. Why would 2 seperate hosting be needed?

blakeromano · 2024-08-06T19:45:37Z

My suggestion is we should move away from using https://github.com/kubernetes-sigs/descheduler/blob/master/cmd/descheduler/app/server.go#L36 as the server we use for Descheduler and instead we should stand up a separate HTTP server and not depend on the k8s apiserver http server code.

blakeromano added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Scraping with Secure Port #1477

Prometheus Scraping with Secure Port #1477

blakeromano commented Aug 2, 2024

blakeromano commented Aug 2, 2024

Athishpranav2003 commented Aug 6, 2024

blakeromano commented Aug 6, 2024

Prometheus Scraping with Secure Port #1477

Prometheus Scraping with Secure Port #1477

Comments

blakeromano commented Aug 2, 2024

blakeromano commented Aug 2, 2024

Athishpranav2003 commented Aug 6, 2024

blakeromano commented Aug 6, 2024