-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update heuristic for container creation time #2800
Conversation
The file "cgroup.clone_children" is not a part of cgroup v2, so use "cgroup.events" instead. This heuristic is quite bad, so also check the cgroup directories to see.
Hi @odinuge. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odinuge, can you provide a test for your change, please?
Yeah! Thought I could see if you guys were interested before writing tests. This function doesn't have a test already, but can definitely add one. |
The chart the you added to this PR clearly show a bug: monotonically increasing start time of a running container :D |
Thanks @odinuge for the fix. Do you think this will fix kubernetes/kubernetes#98494 as well? I'm curious how well the |
Ahh, I see. The current, and with my new patch, that would probably not be the case. In cadvisor, the Lines 68 to 69 in 013e451
clearly states that is it the time of the container creation (and that is not ==start time of a container). I think that the However, if you want the start time of the CRI, I think we need another (and better) heuristic, preferably from CRI and not by reading cgroup files. I really don't think it is a good idea to track system daemon start times by reading mod times on cgroup files, compared to just providing it from the system or CRI. That is however another discussion.
If we talk about the "container creation time", I think this patch is better than the old one, but I still think that this heuristic can never be 100% by nature, due to the fact that cgroups aren't designed to to this. One can look at the start time of the oldest proc in a cgroup, or something like that. Also, started looking at tests, but not really sure how useful they will be. Guess I have to mock an fs, or use temp dirs. |
Thanks @odinuge. Agree, the goal here is to track CreationTime, not last start time.
Definitely, agree the tracking of that should be moved into CRI. I think the confusion stemed from the fact that the kubelet summary API calls this field LGTM on your changes here. Only one question, did you have a chance to test out the |
Thanks @odinuge, I see failures reported in |
I just verified running Thanks @odinuge for addressing this. |
LGTM |
Thanks @bobbypage!
Will test this later today and tomorrow, and report back as soon as I have more details.. 😄
Yeah! If k8s wants a better "started" time, they could look into it separately. 👍 Thanks! I am working with this becuase I want a more mature cgroup v2 support in Kubernetes so that I/we can start using it in production. 😄 |
I have only had time to test with cgroup v2 (have no clusters with prometheus and cgroup v2), but it looks like it works well with there! Pods on a given node: The lowest line is Here is a zoomed out one where one clearly can see the ever increasing metric, before it drops down with this patch:
edit: fixed by using boot time as lowest possible value |
Makes sense to use boot time. |
Nice! Added a check to instead always use |
be44a6b
to
7cc9b4c
Compare
container/common/helpers.go
Outdated
} | ||
} | ||
|
||
for _, cgroupPath := range cgroupPaths { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we iterating over cgroupPaths
again when we already doing it at line 79? Can we not merge the two loops and just iterate once? The only difference that I see between the first loop and the second is you are updating cgroupPath
variable based on if it's v1 or v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought it would be cleaner in order to make it simpler to understand the flow, but I can merge them. (If you are thinking about performance, I have no idea what the go compiler will do to cases like this). Agree that the duplicated logic is kinda stupid, so guess I can fix that anyway. Thanks! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new change now. Does that look better @harche?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks @odinuge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
deea5fc
to
feb4595
Compare
On some systems, root cgroup might report the time when creating the folder /sys/fs/cgroup/subsys, so limit to boot time.
feb4595
to
b372640
Compare
LGTM |
移植upstream对kubelet及cadvisor的修改,修复使用cgroupv2时指标收集统计的问题 1. port cadvisor pr google/cadvisor#2839 reading cpu stats on cgroup v2 2. port cadvisor pr google/cadvisor#2837 read "max" value for cgroup v2 3. port cadvisor pr google/cadvisor#2801 gathering of stats for root cgroup on v2 4. port cadvisor pr google/cadvisor#2800: Update heuristic for container creation time 5. Fix cgroup handling for systemd with cgroup v2 6. test: adjust summary test for cgroup v2
The file "cgroup.clone_children" is not a part of cgroup v2, so use
"cgroup.events" instead. This heuristic is quite bad, so also check the
cgroup directories to see.
The old metric based on this is also constantly increasing due to this, making it more or less useless: