Skip to content

Commit

Permalink
net/mlx5: Expose NIC temperature via hardware monitoring kernel API
Browse files Browse the repository at this point in the history
Expose NIC temperature by implementing hwmon kernel API, which turns
current thermal zone kernel API to redundant.

For each one of the supported and exposed thermal diode sensors, expose
the following attributes:
1) Input temperature.
2) Highest temperature.
3) Temperature label:
   Depends on the firmware capability, if firmware doesn't support
   sensors naming, the fallback naming convention would be: "sensorX",
   where X is the HW spec (MTMP register) sensor index.
4) Temperature critical max value:
   refers to the high threshold of Warning Event. Will be exposed as
   `tempY_crit` hwmon attribute (RO attribute). For example for
   ConnectX5 HCA's this temperature value will be 105 Celsius, 10
   degrees lower than the HW shutdown temperature).
5) Temperature reset history: resets highest temperature.

For example, for dualport ConnectX5 NIC with a single IC thermal diode
sensor will have 2 hwmon directories (one for each PCI function)
under "/sys/class/hwmon/hwmon[X,Y]".

Listing one of the directories above (hwmonX/Y) generates the
corresponding output below:

$ grep -H -d skip . /sys/class/hwmon/hwmon0/*

Output
=======================================================================
/sys/class/hwmon/hwmon0/name:mlx5
/sys/class/hwmon/hwmon0/temp1_crit:105000
/sys/class/hwmon/hwmon0/temp1_highest:48000
/sys/class/hwmon/hwmon0/temp1_input:46000
/sys/class/hwmon/hwmon0/temp1_label:asic
grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied

In addition, displaying the sensors data via lm_sensors generates the
corresponding output below:

$ sensors

Output
=======================================================================
mlx5-pci-0800
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

mlx5-pci-0801
Adapter: PCI adapter
asic:         +46.0°C  (crit = +105.0°C, highest = +48.0°C)

CC: Jean Delvare <jdelvare@suse.com>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Adham Faris authored and kuba-moo committed Aug 9, 2023
1 parent 383a4de commit 1f507e8
Show file tree
Hide file tree
Showing 9 changed files with 463 additions and 141 deletions.
1 change: 1 addition & 0 deletions drivers/net/ethernet/mellanox/mlx5/core/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ config MLX5_CORE
depends on MLXFW || !MLXFW
depends on PTP_1588_CLOCK_OPTIONAL
depends on PCI_HYPERV_INTERFACE || !PCI_HYPERV_INTERFACE
depends on HWMON || !HWMON
help
Core driver for low level functionality of the ConnectX-4 and
Connect-IB cards by Mellanox Technologies.
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/mellanox/mlx5/core/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ endif
mlx5_core-$(CONFIG_MLX5_BRIDGE) += esw/bridge.o esw/bridge_mcast.o esw/bridge_debugfs.o \
en/rep/bridge.o

mlx5_core-$(CONFIG_THERMAL) += thermal.o
mlx5_core-$(CONFIG_HWMON) += hwmon.o
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o
mlx5_core-$(CONFIG_PTP_1588_CLOCK) += lib/clock.o
Expand Down
Loading

0 comments on commit 1f507e8

Please sign in to comment.