-
Notifications
You must be signed in to change notification settings - Fork 429
How to get FW syndrome when using DEVX
Yossi Itigin edited this page Jan 5, 2022
·
3 revisions
When you see UCX error looking like UCX ERROR mlx5dv_xxx(...) failed
in many cases this means error reported by FW.
Every such error has syndrome code that allows to precisely identify the error cause.
Currently the only way to retrieve this syndrome code is enabling dynamic debug in mlx5 driver. Here is how to do this:
-
echo 'func mlx5_cmd_check +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
- enable dynamic debug -
sudo dmesg -c > /dev/null
- clear dmesg - Run the reproducer that yields the error
-
dmesg > dmesg.log
- capture dmesg, it should contain the syndrome code of failed DEVX command -
echo 'func mlx5_cmd_check -p' | sudo tee /sys/kernel/debug/dynamic_debug/control
- disable dynamic debug - Upload dmesg.log to the gihub issue for further analysis
References: