Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cAdvisor does not return CPU metrics for 1.4 #11199

Closed
mwringe opened this issue Oct 3, 2016 · 19 comments
Closed

cAdvisor does not return CPU metrics for 1.4 #11199

mwringe opened this issue Oct 3, 2016 · 19 comments

Comments

@mwringe
Copy link
Contributor

mwringe commented Oct 3, 2016

cAdvisor is returning empty cpu usage for origin 1.4 (built from master, commit ffdeb1b)

  "stats": [
   {
    "timestamp": "2016-10-03T15:53:03.359772024-04:00",
    "cpu": {
     "usage": {
      "total": 0,
      "user": 0,
      "system": 0
     },

Version

commit ffdeb1b

but I suspect this also affect the v1.4.0-alpha.0

In the v1.3.0 versions, cpu metrics are collected

Steps To Reproduce
  1. build origin from git master
  2. access the cAdvisor endpoint for container and verify that it has a zero value for cpu.

eg:
curl -k -H "Authorization: Bearer oc whoami -t" -X POST -d '{"num_stats":1}' https://127.0.0.1:10250/stats/${PROJECT_NAME}/${POD_NAME}/${POD_ID}/${CONTAINER_NAME}

Alternatively, install origin metrics (https://docs.openshift.org/latest/install_config/cluster_metrics.html) and verify that the cpu graphs are empty.

Current Result

cpu metrics are all zero

Expected Result

cpu metrics to not be zero

@mwringe
Copy link
Contributor Author

mwringe commented Oct 3, 2016

@spadgett the root cause is cAdvisor in the 1.4 version of OpenShift returing 0 values for the cpu usage

@mwringe
Copy link
Contributor Author

mwringe commented Oct 3, 2016

@derekwaynecarr any idea here? This breaks the console graphing of cpu usage in 1.4

@spadgett
Copy link
Member

spadgett commented Oct 3, 2016

/cc @jwforres

@derekwaynecarr
Copy link
Member

derekwaynecarr commented Oct 3, 2016

  1. Is openshift running as a container?
  2. What version of docker?
  3. What systemd version (i.e. systemctl --version)

If openshift is running as a container, this will not work until the following is merged upstream and we cherry-pick it into origin 1.4: kubernetes/kubernetes#33806

@mwringe
Copy link
Contributor Author

mwringe commented Oct 3, 2016

docker version: 1.10.3
os: Fedora 24
systemd: 229

OpenShift is running directly on the machine, not in a docker container.

I know that cAdvisor doesn't necessarily work well with a newer systemd, was hoping that would have been resolved by now

@derekwaynecarr
Copy link
Member

systemd 229 changed the cgroup hierarchy and put pid 1 in init.scope.
this caused a bug in runc using cgroup driver for systemd.
this bug is fixed in projectatomic/docker-1.12 but it has not yet been published for rpm install on F24.

@mwringe
Copy link
Contributor Author

mwringe commented Oct 3, 2016

I am now running the fedora docker 1.12 rpms which should have this fix, and yet I still don't see any difference here with the cpu usage, its still zero

@derekwaynecarr
Copy link
Member

@sjenning -- ptal

@sjenning sjenning self-assigned this Oct 4, 2016
@sjenning
Copy link
Contributor

sjenning commented Oct 4, 2016

i've been able to recreate. debugging now.

@sjenning
Copy link
Contributor

sjenning commented Oct 4, 2016

could be related to this 9607748

@sjenning
Copy link
Contributor

sjenning commented Oct 4, 2016

yes, reverting 9607748 fixes the issue

@sjenning
Copy link
Contributor

sjenning commented Oct 4, 2016

Ok gathering information:

opencontainers/runc PR that allows for all cgroup mount points:
opencontainers/runc#1049

PR that vendors this into cAdvisor:
google/cadvisor#1476

PR that vendors new cAdvisor into Kubernetes:
kubernetes/kubernetes#33806

This isn't vendored into any version of projectatomic/docker right now.

Hack commit to get cAdvisor work at all containerized on RHEL for openshift/origin:
9607748

IIUC, the fix for this issue is to vendor the fixed cAdvisor into origin and revert the hack commit.

@smarterclayton
Copy link
Contributor

Yes

@sjenning
Copy link
Contributor

sjenning commented Oct 4, 2016

Just out of morbid curiosity, I wanted to find out why cpuacct,cpu is inverted for RHEL7. Turns out that the list of cgroup subsystems is a statically defined enum in the kernel:
https://github.com/torvalds/linux/blob/master/include/linux/cgroup_subsys.h.

For v3.10 (RHEL7) that subsystem enum is converted into a linked list which inserts at the head, inverting the enum order.

In v3.15, the support for adding cgroup subsystems as modules was dropped and the code was greatly simplified, removing the need for the linked list and reversing the order for users of for_each_subsys

torvalds/linux@3ed80a62b

Upstream for_each_subsys
https://github.com/torvalds/linux/blob/master/kernel/cgroup.c#L557

v3.10 for_each_subsys
https://github.com/torvalds/linux/blob/v3.10/kernel/cgroup.c#L268

proc_cgroup_show uses for_each_subsys to create the output for /proc/pid/cgroup

So... mystery solved!

@smarterclayton
Copy link
Contributor

Gets out knife. Stabs self. Chooses another profession.

On Tue, Oct 4, 2016 at 6:26 PM, Seth Jennings notifications@github.com
wrote:

Just out of morbid curiosity, I wanted to find out why cpuacct,cpu is
inverted for RHEL7. Turns out that the list of cgroup subsystems is a
statically defined enum in the kernel:
https://github.com/torvalds/linux/blob/master/include/
linux/cgroup_subsys.h.

For v3.10 (RHEL7) that subsystem list is converted into a linked list
which inserts at the head, inverting the list.

In v3.14, the support for adding cgroup subsystems as modules was dropped
and the code was greatly simplified, removing the need for the linked list
and reversing the order for users of for_each_subsys

torvalds/linux@3ed80a6
torvalds/linux@3ed80a62b

Upstream for_each_subsys
https://github.com/torvalds/linux/blob/master/kernel/cgroup.c#L557

v3.10 for_each_subsys
https://github.com/torvalds/linux/blob/v3.10/kernel/cgroup.c#L268

proc_cgroup_show uses for_each_subsys to create the output for
/proc//cgroup

So... mystery solved!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11199 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p0oALk_zNjmK3JdKpwO6Jl-RnIhxks5qwtKVgaJpZM4KM_RL
.

@derekwaynecarr
Copy link
Member

Linked lists should only support append!

On Tuesday, October 4, 2016, Clayton Coleman notifications@github.com
wrote:

Gets out knife. Stabs self. Chooses another profession.

On Tue, Oct 4, 2016 at 6:26 PM, Seth Jennings <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Just out of morbid curiosity, I wanted to find out why cpuacct,cpu is
inverted for RHEL7. Turns out that the list of cgroup subsystems is a
statically defined enum in the kernel:
https://github.com/torvalds/linux/blob/master/include/
linux/cgroup_subsys.h.

For v3.10 (RHEL7) that subsystem list is converted into a linked list
which inserts at the head, inverting the list.

In v3.14, the support for adding cgroup subsystems as modules was dropped
and the code was greatly simplified, removing the need for the linked
list
and reversing the order for users of for_each_subsys

torvalds/linux@3ed80a6
torvalds/linux@3ed80a62b

Upstream for_each_subsys
https://github.com/torvalds/linux/blob/master/kernel/cgroup.c#L557

v3.10 for_each_subsys
https://github.com/torvalds/linux/blob/v3.10/kernel/cgroup.c#L268

proc_cgroup_show uses for_each_subsys to create the output for
/proc//cgroup

So... mystery solved!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#11199 (comment)
,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p0oALk_
zNjmK3JdKpwO6Jl-RnIhxks5qwtKVgaJpZM4KM_RL>
.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#11199 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbDiVaiUTjxdJrAXXqvD9MB1AEIX3ks5qwtTNgaJpZM4KM_RL
.

@sjenning
Copy link
Contributor

This is fixed by #11642

@mwringe
Copy link
Contributor Author

mwringe commented Nov 1, 2016

Bugzilla opened for this: https://bugzilla.redhat.com/show_bug.cgi?id=1390502

@ncdc
Copy link
Contributor

ncdc commented Nov 2, 2016

#11642 was superseded by #11709 which has merged, so this can be closed now

@ncdc ncdc closed this as completed Nov 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants