Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read number of sockets correctly, number of sockets set to 0 (ARM64) #2743

Closed
petermetz opened this issue Nov 28, 2020 · 21 comments · Fixed by #2744
Closed

Cannot read number of sockets correctly, number of sockets set to 0 (ARM64) #2743

petermetz opened this issue Nov 28, 2020 · 21 comments · Fixed by #2744
Assignees

Comments

@petermetz
Copy link

I know ARM is not officially supported (notice I'm not using the official image either), so no expectations here at all for an actual fix.

Running Ubuntu 20.04 LTS on a Raspberry Pi 4B 8 GB in case that helps.

$ uname -a
Linux 5.4.0-1022-raspi #25-Ubuntu SMP PREEMPT Thu Oct 15 13:31:49 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
$ cat docker-compose.yml

...
  cadvisor:
    image: zcube/cadvisor:v0.37.0
    ports:
    - target: 8080
      published: 8080
      mode: host
    volumes:
    - /:/rootfs:ro
    - /var/run:/var/run:rw
    - /sys:/sys:ro
    - /var/lib/docker/:/var/lib/docker:ro
...
docker logs cadvisor_1
W1128 17:43:05.535985       1 nvidia.go:61] NVIDIA GPU metrics will not be available: no NVIDIA devices found
W1128 17:43:05.555422       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology
W1128 17:43:05.556342       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory
W1128 17:43:05.556449       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu1/online: open /sys/devices/system/cpu/cpu1/online: no such file or directory
W1128 17:43:05.556522       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu2/online: open /sys/devices/system/cpu/cpu2/online: no such file or directory
W1128 17:43:05.556619       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu3/online: open /sys/devices/system/cpu/cpu3/online: no such file or directory
E1128 17:43:05.556835       1 info.go:114] Failed to get system UUID: open /etc/machine-id: no such file or directory
W1128 17:43:05.558397       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:43:05.558482       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:43:05.558534       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:43:05.558593       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:43:05.558614       1 machine.go:72] Cannot read number of physical cores correctly, number of cores set to 0
W1128 17:43:05.558894       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:43:05.558948       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:43:05.558992       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:43:05.559045       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:43:05.559064       1 machine.go:86] Cannot read number of sockets correctly, number of sockets set to 0
W1128 17:43:05.779355       1 manager.go:288] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
W1128 17:48:06.166638       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology
W1128 17:48:06.167769       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory
W1128 17:48:06.167986       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu1/online: open /sys/devices/system/cpu/cpu1/online: no such file or directory
W1128 17:48:06.168103       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu2/online: open /sys/devices/system/cpu/cpu2/online: no such file or directory
W1128 17:48:06.168197       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu3/online: open /sys/devices/system/cpu/cpu3/online: no such file or directory
E1128 17:48:06.168379       1 info.go:114] Failed to get system UUID: open /etc/machine-id: no such file or directory
W1128 17:48:06.169090       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:48:06.169271       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:48:06.169346       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:48:06.169415       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:48:06.169438       1 machine.go:72] Cannot read number of physical cores correctly, number of cores set to 0
W1128 17:48:06.169837       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:48:06.169950       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:48:06.170007       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:48:06.170071       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:48:06.170090       1 machine.go:86] Cannot read number of sockets correctly, number of sockets set to 0
W1128 17:53:06.165974       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology
W1128 17:53:06.166788       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory
W1128 17:53:06.166881       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu1/online: open /sys/devices/system/cpu/cpu1/online: no such file or directory
W1128 17:53:06.166959       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu2/online: open /sys/devices/system/cpu/cpu2/online: no such file or directory
W1128 17:53:06.167033       1 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu3/online: open /sys/devices/system/cpu/cpu3/online: no such file or directory
E1128 17:53:06.167165       1 info.go:114] Failed to get system UUID: open /etc/machine-id: no such file or directory
W1128 17:53:06.167613       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:53:06.167677       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:53:06.167724       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:53:06.167778       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:53:06.167799       1 machine.go:72] Cannot read number of physical cores correctly, number of cores set to 0
W1128 17:53:06.168038       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping
W1128 17:53:06.168090       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping
W1128 17:53:06.168133       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping
W1128 17:53:06.168185       1 machine.go:253] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping
E1128 17:53:06.168200       1 machine.go:86] Cannot read number of sockets correctly, number of sockets set to 0
@iwankgb
Copy link
Collaborator

iwankgb commented Nov 29, 2020

This is really interesting. For some reason on RPI4 there is no /sys/devices/system/cpu/cpu?/online file, but /sys/devices/system/cpu/cpu?/hotplug/ directory is there.

@petermetz
Copy link
Author

This is really interesting. For some reason on RPI4 there is no /sys/devices/system/cpu/cpu?/online file, but /sys/devices/system/cpu/cpu?/hotplug/ directory is there.

Thanks @iwankgb for the potential fix. Happy to help out with testing it, just let me know!

@iwankgb
Copy link
Collaborator

iwankgb commented Nov 30, 2020

There is a bit more that must be done: online file is being read while looking for numa nodes and sockets too.

@echo467
Copy link

echo467 commented Dec 6, 2020

have you fixed ? How?

@iwankgb
Copy link
Collaborator

iwankgb commented Dec 6, 2020

@longyang10208 there is a link to pending pull request above.

@limpep
Copy link

limpep commented Dec 21, 2020

@iwankgb I built a image with you branch I still had the same issue. Here is the Dockerfile i used

# Builder
FROM arm32v7/golang as builder

MAINTAINER Limpep

ENV CADVISOR_VERSION "assume_disabled_hotplug_in_no_online"

RUN apt-get update && apt-get install -y git dmsetup && apt-get clean

RUN git clone --branch ${CADVISOR_VERSION} https://github.com/iwankgb/cadvisor.git /go/src/github.com/google/cadvisor

WORKDIR /go/src/github.com/google/cadvisor

RUN make build

# Image for usage
FROM arm32v7/debian

MAINTAINER Limpep

COPY --from=builder /go/src/github.com/google/cadvisor/cadvisor /usr/bin/cadvisor

EXPOSE 8080
ENTRYPOINT ["/usr/bin/cadvisor", "-logtostderr"]

Linux raspberrypi 5.4.81-v8+ #1378 SMP PREEMPT Mon Dec 7 18:48:00 GMT 2020 aarch64 GNU/Linux

@iwankgb
Copy link
Collaborator

iwankgb commented Dec 26, 2020

@limpep, could you provide cadvisor logs or output, please?

@limpep
Copy link

limpep commented Dec 27, 2020

@iwankgb here the output logs

W1227 00:46:40.808255       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology,
W1227 00:46:40.809269       1 sysfs_notarm64.go:40] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory,
W1227 00:46:40.809550       1 sysfs_notarm64.go:40] unable to read /sys/devices/system/cpu/cpu1/online: open /sys/devices/system/cpu/cpu1/online: no such file or directory,
W1227 00:46:40.809781       1 sysfs_notarm64.go:40] unable to read /sys/devices/system/cpu/cpu2/online: open /sys/devices/system/cpu/cpu2/online: no such file or directory,
W1227 00:46:40.809991       1 sysfs_notarm64.go:40] unable to read /sys/devices/system/cpu/cpu3/online: open /sys/devices/system/cpu/cpu3/online: no such file or directory,
W1227 00:46:40.810802       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping,
W1227 00:46:40.811089       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping,
W1227 00:46:40.811327       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping,
W1227 00:46:40.811509       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping,
E1227 00:46:40.811631       1 machine.go:69] Cannot read number of physical cores correctly, number of cores set to 0,
W1227 00:46:40.812008       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu0 online state, skipping,
W1227 00:46:40.812260       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu1 online state, skipping,
W1227 00:46:40.812449       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu2 online state, skipping,
W1227 00:46:40.812630       1 machine_notarm64.go:41] Cannot determine CPU /sys/bus/cpu/devices/cpu3 online state, skipping,
E1227 00:46:40.812751       1 machine.go:83] Cannot read number of sockets correctly, number of sockets set to 0,
W1227 00:46:41.042129       1 manager.go:288] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory,

and here is the pull build log
https://pastebin.com/9uqfCqQE

@iwankgb
Copy link
Collaborator

iwankgb commented Dec 27, 2020

Why have you decided to build for arm32 architecture? The fix will work for arm64 only. Instead of building using arm32 architecture you can try to cross build.
I assume that you may use arm64 build. Unfortunately I have no access to arm32 system and I'm not able to test my fix on this architecture.

@limpep
Copy link

limpep commented Dec 27, 2020

@iwankgb To be honest I am fairly new to the whole docker world. Not sure how to build for arm64

@iwankgb
Copy link
Collaborator

iwankgb commented Dec 29, 2020

#2770 is related - I came across it fixing #2750 (that I discovered on Raspberry Pi too).

@tablatronix
Copy link

same, also willing to test , using zcube/cadvisor also rpi 4

@fastlorenzo
Copy link

I did a quick check in the manager.go and it might be related to this part of the code:

func GetPhysicalCores(procInfo []byte) int {
numCores := getUniqueMatchesCount(string(procInfo), coreRegExp)
if numCores == 0 {
// read number of cores from /sys/bus/cpu/devices/cpu*/topology/core_id to deal with processors
// for which 'core id' is not available in /proc/cpuinfo
numCores = getUniqueCPUPropertyCount(cpuBusPath, sysFsCPUCoreID)
}
if numCores == 0 {
klog.Errorf("Cannot read number of physical cores correctly, number of cores set to %d", numCores)
}
return numCores
}

numCores := getUniqueMatchesCount(string(procInfo), coreRegExp) is using the following regex to check the number of cores in /proc/cpuinfo: (?m)^core id\s*:\s*([0-9]+)$

However, my /proc/cpuinfo on Raspberry Pi 4B looks like this:

╰─$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal
╰─$ uname -a
Linux k8s-04 5.4.0-1026-raspi #29-Ubuntu SMP PREEMPT Mon Dec 14 17:01:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
╰─$ cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2835
Revision        : c03112
Serial          : 100000006a7a5b43
Model           : Raspberry Pi 4 Model B Rev 1.2

numCores will then show 0 and the following function will be called:

numCores = getUniqueCPUPropertyCount(cpuBusPath, sysFsCPUCoreID)

@iwankgb I'm guessing the main change in your PR is to "bypass" this cpu online check for Raspberry Pi?

cadvisor/machine/machine.go

Lines 250 to 255 in 19ba5a8

onlinePath := filepath.Join(sysCPUPath, "online")
onlineVal, err := ioutil.ReadFile(onlinePath)
if err != nil {
klog.Warningf("Cannot determine CPU %s online state, skipping", sysCPUPath)
continue
}

It would be great to understand as well why the Raspberry Pi doesn't have a /sys/devices/system/cpu/cpuX/online state at first.

My .2, hope it helps understand where the issue is for people reading this issue.

@ryancurrah
Copy link

@fastlorenzo same issue here bot sure what the repercussion of this issue is though.

@iwankgb
Copy link
Collaborator

iwankgb commented Jan 23, 2021

@fastlorenzo:

It would be great to understand as well why the Raspberry Pi doesn't have a /sys/devices/system/cpu/cpuX/online state at first.

CPU hotplug is disabled on RPi in general: raspberrypi/linux#843. You can take a look at #2744 where I work on the fix.

@Letme
Copy link

Letme commented Feb 11, 2021

Can I ask why there won't be a fix for 32bit kernels which is what official Raspbians are?

@Letme
Copy link

Letme commented Feb 11, 2021

Can I ask why there won't be a fix for 32bit kernels which is what official Raspbians are?

Sorry, found the answer - you don't have hardware to test the fix. I am willing to test it for you, or even provide you access to hardware if you wish.

@iwankgb
Copy link
Collaborator

iwankgb commented Feb 12, 2021

@Letme upload a snapshot of /sys/devices/system/cpu and content of /proc/cpuinfo, please.

@Letme
Copy link

Letme commented Feb 12, 2021

RaspberryPi4:

$ ls /sys/devices/system/cpu/
cpu0  cpu1  cpu2  cpu3	cpufreq  isolated  kernel_max  modalias  offline  online  possible  power  present  uevent
$ cat /proc/cpuinfo 
processor	: 0
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 270.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 1
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 270.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 2
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 270.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 3
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 270.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

Hardware	: BCM2711
Revision	: c03111
Serial		: 100000003bc35188
Model		: Raspberry Pi 4 Model B Rev 1.1

RaspberryPi3B:

$ ls /sys/devices/system/cpu/
cpu0  cpu1  cpu2  cpu3	cpufreq  isolated  kernel_max  modalias  offline  online  possible  power  present  uevent
$ cat /proc/cpuinfo 
processor	: 0
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 1
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 2
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 3
model name	: ARMv7 Processor rev 4 (v7l)
BogoMIPS	: 76.80
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

Hardware	: BCM2835
Revision	: a52082
Serial		: 00000000761ed550
Model		: Raspberry Pi 3 Model B Rev 1.2

@iwankgb iwankgb self-assigned this Feb 12, 2021
bobbypage added a commit that referenced this issue Mar 12, 2021
bobbypage added a commit that referenced this issue Mar 12, 2021
@jc42jc
Copy link

jc42jc commented Apr 29, 2021

Hello,
The issue is still present on Kubernetes / K3S v1.21.0+k3s1

"W0429 19:41:04.325885 4743 sysinfo.go:203] Nodes topology is not available, providing CPU topology"

@iwankgb
Copy link
Collaborator

iwankgb commented May 7, 2021

@jc42jc this is expected behaviour. There is no NUMA on RPi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants