Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netclass Collector Performs Slowly on Nodes with Heavy Workload #2477

Closed
raptorsun opened this issue Sep 22, 2022 · 9 comments
Closed

Netclass Collector Performs Slowly on Nodes with Heavy Workload #2477

raptorsun opened this issue Sep 22, 2022 · 9 comments

Comments

@raptorsun
Copy link
Contributor

raptorsun commented Sep 22, 2022

The NetClass collector in Node Exporter performs slowly in some Kubernetes clusters serving heavy workloads(high CPU usage, frequent network configuration change). In worst cases, the NetClass collector blocks the process for several seconds, leading to a timeout when Prometheus scrapes Node Exporter, thus losing metrics from other collectors.

When Node Exporter slows down in these clusters, it spends most of its CPU time in NetClass collector. Here is a typical profile below.
profile1

Strace from Node Exporter on overloaded worker nodes shows the syscall with worst performance is read in /sys/class/net//. Several read executions looped on ERESTARTNOINTR for more than 5 seconds.

13:51:10.581 openat(AT_FDCWD, "/host/sys/class/net/lo/phys_port_name", O_RDONLY|O_CLOEXEC) = 58
13:51:10.598 futex(0xc0013ec948, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
13:51:10.742 fcntl(58, F_GETFL)         = 0x8000 (flags O_RDONLY|O_LARGEFILE)
13:51:10.742 fcntl(58, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
13:51:10.743 fcntl(58, F_GETFL)         = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE)
13:51:10.743 fcntl(58, F_SETFL, O_RDONLY|O_LARGEFILE) = 0
13:51:10.744 read(58, 0xc001485710, 128) = ? ERESTARTNOINTR (To be restarted)
13:51:10.744 read(58, 0xc001485710, 128) = ? ERESTARTNOINTR (To be restarted)
….
13:51:11.644 read(58, 0xc001485710, 128) = ? ERESTARTNOINTR (To be restarted)
13:51:11.652 read(58, "0\n", 128)       = 2

From strace we found read() returns ERESTARTNOINTR the following files in /sys/class/net/*/:

  • threaded (most frequent)
  • netdev_group
  • type
  • addr_len
  • link_mode
  • testing
  • proto_down
  • speed
  • duplex
  • phys_port_name
  • gro_flush_timeout
  • broadcast

Host operating system: output of uname -a

4.18.0-372.26.1.el8_6.x86_64

node_exporter version: output of node_exporter --version

Node Exporter 1.1.2 + Golang 1.14
Node Exporter 1.3.1 + golang 1.18

node_exporter command line flags

  • --web.listen-address=127.0.0.1:9100
  • --path.sysfs=/host/sys
  • --path.rootfs=/host/root
  • --no-collector.wifi
  • --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run/k3s/containerd/.+|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
  • --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$
  • --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$
  • --collector.cpu.info
  • --collector.textfile.directory=/var/node_exporter/textfile
  • --no-collector.cpufreq

Are you running node_exporter in Docker?

No, it is running in a Kubernetes cluster.

What did you do that produced an error?

/

What did you expect to see?

/

What did you see instead?

/

@raptorsun
Copy link
Contributor Author

raptorsun commented Sep 22, 2022

I see 2 potential ways to improve the performance of NetClass collector.

The first method is to cache the metrics which rarely change, such as addr_assign_type, addr_len, dev_id. Non-carrier-related metrics are mostly stable and changes are made out of planned configuration adjustments.

The second method is using netlink instead of sysfs to send fewer syscalls to avoid holding RTNL lock at least fortunate moments. Though the RTM_GETLINK request through netlink returns most metrics which sysfs returns, some hardware related information are missing:

sysfs File Netlink - RTM_GETLINK reply field
addr_assign_type not available
addr_len IFLA_ADDRESS
address IFLA_ADDRESS
broadcast IFLA_BROADCAST
Carrier IFLA_CARRIER
Carrier_changes IFLA_CARRIER_CHANGES
Carrier_up_count IFLA_CARRIER_UP_COUNT
carrier_down_count IFLA_CARRIER_DOWN_COUNT
dev_id not available
dormant IFLA_LINKMODE
duplex not available
flags header bytes[8:12]
ifalias IFLA_IFALIAS
ifindex header bytes[4:8]
iflink IFLA_LINK
link_mode IFLA_LINKMODE
mtu IFLA_MTU
name_assign_type not available
netdev_group IFLA_GROUP
operstate IFLA_OPERSTATE
phys_port_id IFLA_PHYS_PORT_ID
phys_port_name IFLA_PHYS_PORT_NAME
phys_switch_id IFLA_PHYS_SWITCH_ID
speed not available
tx_queue_len IFLA_TXQLEN
type IFLA_LINK

The missing metrics are:

  • addr_assign_type
  • duplex
  • name_assign_type
  • speed

@SuperQ
Copy link
Member

SuperQ commented Sep 22, 2022

netlink seems like a good solution. We recently refactored the netdev collector to use netlink with good results.

Related questions, what is your system configuration like?

  • How many CPUs on the node?
  • What CPU request/limits are on the node_exporter?
  • Are you configuring GOMAXPROCS?

For example, we have some 24xlarge nodes where we configure node_exporter with GOMAXPROCS=2 and a CPU request of 100m. This greatly improved the reliability of our node_exporter in our deployments.

@raptorsun
Copy link
Contributor Author

Hello @SuperQ , we have tested on 2 setups:

  • 4 CPU x 6 nodes
  • 96 CPU x 27 nodes

Resource request on the node_exporter daemonset is 8m CPU and 32MB memory, no limit is set.

I have tested GOMAXPROCS=2 and raising CPU request to 100m for node exporter on the smaller kubernetes cluster (4 CPU x 6 nodes) using OVN as its CNI. The CPU usage and the scrape time with the setting are higher than those without the settings of GOMAXPROCS=2 and 100m CPU, as the diagram below:

  • GOMAXPROCS=2 Node Exporter CPU Usage
    test_cpu
  • GOMAXPROCS=2 Scrape Time
    test_scrapetime
  • GOMAXPROCS unset Node Exporter CPU Usage
    ref_cpu
  • GOMAXPROCS unset Scrape Time
    ref_scrapetime

@raptorsun
Copy link
Contributor Author

Shall we add a new collector using netlink to collect the metrics collected by netclass collector? As RTM_GETLINK does not return all metrics sysfs can provide, it may be safer to leave the original netclass collector untouched and add a faster collector with a little fewer metrics.

@discordianfish
Copy link
Member

Hrmm good question.. But yeah feels like a new collector might make most sense.

@raptorsun
Copy link
Contributor Author

A pull request is created to add a new collector to collect most of the metrics the netclass collector does: #2492

It is also possible to merge netclass collector with netdev collector, because the response message to RTM_GETLINK already contains these metrics. (please refer to netdev collector codes for details)

@raptorsun
Copy link
Contributor Author

Some test results comparing the performance of netclass collector using sysfs and netlink are posted on PR#2492.

Scrape time is much lower than the sysfs implementation in most cases.

@raptorsun
Copy link
Contributor Author

raptorsun commented Nov 21, 2022

PR #2492 has been merged.
PR #2528 is in progress, incorporating the Netlink implementation into the existing netclass collector.

@raptorsun
Copy link
Contributor Author

All done, issue closed.
Thank you for the help @SuperQ @discordianfish :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants