-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kube-prometheus-stack] Feat: Improve cAdvisor metrics scrape #5063
Conversation
Followup to #5050 (CC @Breee). For reference, this is the hardcoded housekeeping interval. Note, this will likely cause We could disable it by default in the values.yaml, or we could remove it from the upstream kube-prometheus alert mixin. |
Hey @SuperQ I feel the information about trackTimestampsStaleness and cAdvisor interesting for some users (and for maintainers very important). WDYT about leaving a note or link to the
Does this include a lot of warnings From my point of view, it should be sufficient to exclude the kubelet scrape job from PrometheusDuplicateTimestamps alert, or I'm wrong? |
It shouldn't. This is more related to ingesting the same timestamps more than once. Maybe this is a Prometheus metric "bug", that we're counting the same condition for multiple reasons. I haven't investigated the code for this. |
Sure, so here's the details on Normal metrics exposed to Prometheus do not include timestamps. Metrics are supposed to be produced "on demand", so the data timestamp is the same as the scrape timestamp. cAdvisor does not follow this best practice. Instead cAdvisor has its own internal data gathering loop, it exposes timestamps on Prometheus scrapes have millisecond time accuracy and are intentionally splayed over time avoid correlated network traffic. But user queries do have a correlated timestamp, and systems like Grafana will snap the query to a round number like 00:30:00. This means that when you query at 00:30:00, but the metric you want was scraped at 00:29:55.231, Prometheus has to "look back" some amount of time to find that sample. This lookback defaults to 5 minutes in order to handle a variety of scrape interval use cases. The down side to lookback is that if a metric is removed from an endpoint, or a target is down, Prometheus would continue to show data up to 5 minutes after it really existed. This was confusing for users. In Prometheus 2.0 we improved on this situation by introducing a staleness handler. When a metric label combo is removed from a scrape it will be marked as "stale" in the TSDB. Prometheus, due to historical reasons, handles metrics that include the optional timestamp field without this staleness marker. Because cAdvisor has its own internal data gathering loop, it exposes timestamps on But now, without the staleness markers, Prometheus will continue to apply the lookback of up to 5 minutes to find metrics. This can be very confusing for users as they will "see" cAdvisor metrics that don't exist anymore. The common example of this is metrics like The new track timestamp staleness feature was introduced to fix this behavior. Now with The previous workaround was to disable "honorTimestamps" for cAdvisor. This worked to fix the staleness handling issue. But it causes the data to be slightly inaccurate as cAdvisor's internal loop is not in sync with Prometheus. So in order to fully enable The second half of this change has to do with the fact that cAdvisor does not have a strict timer for internal data collection. It uses variable timing from 10 to 15 seconds in Kubernetes. So in order to capture every sample produced by cAvisor, we need to scrape slightly faster than the minimum housekeeping interval. |
2adb9e3
to
23c1934
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some question here and I guess one go template condition is set to the wrong endpoint.
Thanks for the detailed comment. The link should be added to kubelet.serviceMonitor.trackTimestampsStaleness
in the value.yaml!
charts/kube-prometheus-stack/templates/exporters/kubelet/servicemonitor.yaml
Show resolved
Hide resolved
charts/kube-prometheus-stack/templates/exporters/kubelet/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
charts/kube-prometheus-stack/templates/exporters/kubelet/servicemonitor.yaml
Outdated
Show resolved
Hide resolved
charts/kube-prometheus-stack/templates/exporters/kubelet/servicemonitor.yaml
Show resolved
Hide resolved
23c1934
to
5727923
Compare
Improve the collection of cAdvisor metrics by adjusting the scrape interval to match the kubelet hardcoded minimum housekeeping interval. * Set the cAdvisor, and resource, interval to 10s by default. * Use the user configured a kubelet metrics interval if not default. * Enforce honorTimestamps for cAdvisor and resource if timetamp staleness is enabled. Signed-off-by: SuperQ <superq@gmail.com>
5727923
to
eece396
Compare
@SuperQ what about the PrometheusDuplicateTimestamps alert that you mention earlier? Should it be covered in this or an follow-up PR, if users complain about? |
I've been testing the 10s interval on one of my k3s test clusters all day and haven't been able to reproduce the duplicate timestamp issue there. Perhaps there is something wrong with my production setup that isn't normal. Maybe it's a cAdvisor under heavy load problem? I'm not sure at this point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Hello, after these changes I started repeatedly getting the following log messages like this one, for every kubelet target, in the prometheus container: time=2024-12-18T17:08:26.335Z level=WARN source=scrape.go:1877 msg="Error on ingesting out-of-order samples" component="scrape manager" scrape_pool=serviceMonitor/monitoring/prometheus-agent-kube-prom-kubelet/1 target=https://10.134.60.34:10250/metrics/cadvisor num_dropped=2646 My prometheus setup (multiple agents and one receiver) is configured with a scrapeInterval of 1m. If I set the kubelet scrape to 1m as well, the messages disappear. Furthermore, when scraping with 10s, data points in grafana stay 1m apart (querying the remote receiver prometheus and not the prometheus scraping these metrics). However, the graphs produced by prometheus have gaps in them, for cAdvisor metrics. Are these log messages expected? Thank you in advance. |
What this PR does / why we need it
Improve the collection of cAdvisor metrics by adjusting the scrape interval to match the kubelet hardcoded minimum housekeeping interval.
Which issue this PR fixes
(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged)Special notes for your reviewer
Checklist
[prometheus-couchdb-exporter]
)