Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHY init failed on Linux 6.8 #73

Open
cahz opened this issue Aug 16, 2024 · 10 comments
Open

PHY init failed on Linux 6.8 #73

cahz opened this issue Aug 16, 2024 · 10 comments

Comments

@cahz
Copy link
Contributor

cahz commented Aug 16, 2024

With the latest develop version (which is required for Linux 6.8), I cannot get our TN9710P (with MV88X3310) to initialize.

Loading the module leads to the following output:

[  878.238757] tn40xx: Tehuti Network Driver from https://github.com/acooks/tn40xx-driver, linux-6.7.y-1
[  878.238761] tn40xx: Supported phys : MV88X3120 MV88X3310  QT2025 TLK10232 AQR105 MUSTANG 
[  878.238885] tn40xx 0000:02:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x2 mrrs 0x2
[  878.707481] tn40xx 0000:02:00.0: PHY init failed

I noticed that the check in tn.c:444 fails. Replacing the condition with !phy_id, it continues a bit further, but later fails:

[  878.347776] tn40xx 0000:02:00.0: PHY detected ID=2B09AA - MV88X3310 (A0) 10Gbps 10GBase-T
[  878.707473] MV88X3310 Initialization Error. Expected 0x000A, read 0xFFFF
@TerminalAddict
Copy link

same, upgrade to Proxmox 8.2 from 7.4 cause a failure

syslog:2024-08-19T16:03:40.094601+12:00 homeworld kernel: [  686.982348] tn40xx: Tehuti Network Driver from https://github.com/acooks/tn40xx-driver, linux-6.7.y-1
syslog:2024-08-19T16:03:40.094609+12:00 homeworld kernel: [  686.982350] tn40xx: Supported phys :    QT2025 TLK10232 AQR105 MUSTANG
syslog:2024-08-19T16:03:40.094609+12:00 homeworld kernel: [  686.982461] tn40xx 0000:01:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x1 mrrs 0x2
syslog:2024-08-19T16:03:40.095662+12:00 homeworld kernel: [  686.982598] tn40xx 0000:01:00.0: PHY init failed

@qume
Copy link

qume commented Sep 9, 2024

Same here with proxmox and TN9310 card.

[    9.106643] tn40xx 0000:01:00.0: enabling device (0000 -> 0002)
[    9.106777] tn40xx 0000:01:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x1 mrrs 0x2
[    9.106918] tn40xx 0000:01:00.0: PHY init failed

@demonfoo
Copy link

I tried booting Linux Mint 22 on a machine of mine with a StarTech ST10GSPEXNB NIC, and had the same problem; I've engaged in some troubleshooting, and found that changing line 444 of tn40.c to:

        if (phy_id == 0)

gets further, but it doesn't seem to like something it's doing in the bdx_mdio_set_speed() function, when it tries to call it during bdx_phy_init():

[12712.353749] tn40xx 0000:06:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x0 mrrs 0x2
[12712.353968] bdx_mdio_set_speed(): test 1; mdio_cfg is 00003ec0
[12712.353970] bdx_mdio_set_speed(): test 2; mdio_cfg is 00003ec8
[12712.455736] bdx_phy_init(): test 1, phy_type = 0x00000004
[12712.455743] tn40xx 0000:06:00.0: PHY detected ID=2B09AB - MV88X3310 (A1) 10Gbps 10GBase-T
[12712.455752] bdx_mdio_set_speed(): test 1; mdio_cfg is 00003ec8
[12712.455753] bdx_mdio_set_speed(): test 2; mdio_cfg is 00000a48
[12712.815491] MV88X3310 Initialization Error. Expected 0x000A, read 0xFFFF
[12712.815499] bdx_phy_init(): test 2
[12712.815501] tn40xx 0000:06:00.0: PHY init failed

Unfortunately I'm not sure what values it's expecting, but the NIC doesn't like what it's getting.

@acooks
Copy link
Owner

acooks commented Sep 30, 2024

[12712.455743] tn40xx 0000:06:00.0: PHY detected ID=2B09AB - MV88X3310 (A1) 10Gbps 10GBase-T

Is there something unclear in the Readme about Marvell PHYs not being supportable? Or perhaps it isn't obvious to people when they have a Marvell PHY?

@demonfoo
Copy link

@acooks What? Prior to the change I made, it didn't provide any useful error. I have the firmware image, and based on nm output the firmware image for it is present in the .ko file. Are you saying those just don't work anymore at all? Or what?

@acooks
Copy link
Owner

acooks commented Sep 30, 2024

I'm saying that the output you posted shows that you have an MV88X3310 phy, and those Marvell PHYs cannot be supported in this driver due to licensing issues, as I have already explained several times. Clearly the problem is in my explanation.

@robanderson
Copy link

I'm having the same issue here on Promox 8.2 running Linux 6.8.12-2-pve. Unfortunately I don't have a paid support subscription to ask Proxmox to sort as I'm just a home user trying to learn AI.

03:00.0 Ethernet controller [0200]: Tehuti Networks Ltd. TN9310 10GbE SFP+ Ethernet Adapter [1fc9:4022]
	Subsystem: Edimax Computer Co. 10 Gigabit Ethernet SFP+ PCI Express Adapter [1432:8103]
	Flags: fast devsel, IRQ 16, NUMA node 0, IOMMU group 52
	Memory at 3800ffe00000 (64-bit, prefetchable) [size=64K]
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Power Management version 3
	Capabilities: [80] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Kernel modules: tn40xx

I have checked and the vendor and device ID do not (as yet) appear on the list of problem cards.

root@T7920-1:~# dmesg | grep tn40
[   29.907554] tn40xx: module verification failed: signature and/or required key missing - tainting kernel
[   29.909497] tn40xx: Tehuti Network Driver from https://github.com/acooks/tn40xx-driver, linux-6.7.y-1
[   29.909500] tn40xx: Supported phys :    QT2025 TLK10232 AQR105 MUSTANG 
[   29.917261] tn40xx 0000:03:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x0 mrrs 0x2
[   29.917458] tn40xx 0000:03:00.0: PHY init failed

Uname

root@T7920-1:~# uname -r
6.8.12-2-pve

ChatGPT seems to think there have been some changes to the NAPI functionality, but AI can hallucinate. So this might or might not be helpful.

Changes in the NAPI Interface in Kernel 6.8
napi_complete Replaced with napi_complete_done:
In kernel 6.8, the napi_complete function has been replaced by napi_complete_done.
The new function napi_complete_done requires an additional parameter: the number of packets processed (work_done).
This change is intended to improve the NAPI polling mechanism by providing more accurate information about the amount of work completed.
Possible Changes in the bdx_poll Function Signature:
The napi_poll function signature may have changed, although in most kernels, it remains:
c
Copy code
int (*poll)(struct napi_struct *napi, int budget);
If the signature has changed in your kernel version, you will need to adjust it accordingly.

@DatPat
Copy link

DatPat commented Oct 17, 2024

I'm saying that the output you posted shows that you have an MV88X3310 phy, and those Marvell PHYs cannot be supported in this driver due to licensing issues, as I have already explained several times. Clearly the problem is in my explanation.

My understanding was that these nics could be supported if the appropriate firmware was provided prior to the complication process. Am I wrong in this?

@DatPat
Copy link

DatPat commented Oct 17, 2024

so I run a STLab N-480 and I have the following issue: bdx_mdio_scan_phy_id finds the phyid of 2b09ab on port 0 which looks like a valid value to me, however port 0 does not appear to be a valid port.
phy_id = bdx_mdio_scan_phy_id(priv); /* set phy_mdio_port */

if (!priv->phy_mdio_port){
	dev_err(&priv->pdev->dev, "No PHY detected on MDIO bus.");
	return PHY_TYPE_NA;	/* No PHY detected on MDIO bus. */
}

There is an explicit check on the port being 0 so I am confused as to how to proceed as I know nothing about the hardware.

	i = bdx_mdio_look_for_phy(priv,*port_t);
	if (i >= 0)  // PHY  found

the original code has the index signed and thinks port(i) == 0 to be valid. I could really use some help here.

[ 3322.357350] tn40xx: Driver unloaded
[ 3329.290170] tn40xx: Tehuti Network Driver from https://github.com/acooks/tn40xx-driver, linux-6.7.y-1
[ 3329.290173] tn40xx: Supported phys : MV88X3310 QT2025 TLK10232 AQR105 MUSTANG
[ 3329.290302] tn40xx 0000:67:00.0: srom 0x0 HWver 16 build 0 lane# 2 max_pl 0x0 mrrs 0x2
[ 3329.398797] tn40xx 0000:67:00.0: phy_id 2b09ab
[ 3329.398802] tn40xx 0000:67:00.0: priv->phy_mdio_port 0
[ 3329.398804] tn40xx 0000:67:00.0: PHY detected ID=2B09AB - MV88X3310 (A1) 10Gbps 10GBase-T
[ 3329.758550] MV88X3310 Initialization port detected 0
[ 3332.974543] MV88X3310 initdata applied
[ 3332.974639] MV88X3310 I/D version is 0.3.4.0
[ 3333.159878] tn40xx 0000:67:00.0 eth0: fw 0xe
[ 3333.159890] tn40xx 0000:67:00.0 eth0: Port A
[ 3333.159920] tn40xx 0000:67:00.0: 1 1fc9:4027:1fc9:3015
[ 3333.159955] tn40xx: detected 1 cards, 1 loaded
[ 3333.161427] tn40xx 0000:67:00.0 enp103s0: renamed from eth0
pat@pat:~/Code/driver$ uname -r
6.8.0-45-generic

now it works, connectivity, data, everything.

I don't understand how port 0 can be valid on my card when this is clearly a cause for error in the driver as is.

Here are some infos to my card:

67:00.0 Ethernet controller [0200]: Tehuti Networks Ltd. TN9710P 10GBase-T/NBASE-T Ethernet Adapter [1fc9:4027]
Subsystem: Tehuti Networks Ltd. Ethernet Adapter [1fc9:3015]
Flags: bus master, fast devsel, latency 0, IRQ 204, IOMMU group 17
Memory at fc65300000 (64-bit, prefetchable) [size=64K]
Capabilities:
Kernel driver in use: tn40xx
Kernel modules: tn40xx

@DatPat
Copy link

DatPat commented Oct 18, 2024

With the latest develop version (which is required for Linux 6.8), I cannot get our TN9710P (with MV88X3310) to initialize.

Loading the module leads to the following output:

[  878.238757] tn40xx: Tehuti Network Driver from https://github.com/acooks/tn40xx-driver, linux-6.7.y-1
[  878.238761] tn40xx: Supported phys : MV88X3120 MV88X3310  QT2025 TLK10232 AQR105 MUSTANG 
[  878.238885] tn40xx 0000:02:00.0: srom 0x0 HWver 16 build 0 lane# 4 max_pl 0x2 mrrs 0x2
[  878.707481] tn40xx 0000:02:00.0: PHY init failed

I noticed that the check in tn.c:444 fails. Replacing the condition with !phy_id, it continues a bit further, but later fails:

[  878.347776] tn40xx 0000:02:00.0: PHY detected ID=2B09AA - MV88X3310 (A0) 10Gbps 10GBase-T
[  878.707473] MV88X3310 Initialization Error. Expected 0x000A, read 0xFFFF

This is the exact issue I had, to fix this you need to set port to 0 on top of 'MV88X3310_mdio_reset'. when i did that it started working for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants