Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pmon] pcied keeps flooding WARNING and ERROR logs #8401

Closed
bingwang-ms opened this issue Aug 10, 2021 · 3 comments
Closed

[pmon] pcied keeps flooding WARNING and ERROR logs #8401

bingwang-ms opened this issue Aug 10, 2021 · 3 comments

Comments

@bingwang-ms
Copy link
Contributor

Description

pcied keeps flooding WARNING and ERROR on mellanox platform. But seems all PCIE cards are working, and all interfaces are up.

Aug  9 14:39:24.804995 str-msn4600c-acs-02 WARNING pmon#pcied: PCIe Device: Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Not Found
Aug  9 14:39:24.814616 str-msn4600c-acs-02 ERR pmon#pcied: PCIe device status check : FAILED
Aug  9 14:40:21.389986 str-msn4600c-acs-02 NOTICE acms#root: Waiting for bootstrap cert
Aug  9 14:40:24.893317 str-msn4600c-acs-02 WARNING pmon#pcied: PCIe Device: Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Not Found
Aug  9 14:40:24.900130 str-msn4600c-acs-02 ERR pmon#pcied: PCIe device status check : FAILED
Aug  9 14:41:21.396143 str-msn4600c-acs-02 NOTICE acms#root: Waiting for bootstrap cert
Aug  9 14:41:24.988844 str-msn4600c-acs-02 WARNING pmon#pcied: PCIe Device: Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Not Found
Aug  9 14:41:24.996751 str-msn4600c-acs-02 ERR pmon#pcied: PCIe device status check : FAILED
Aug  9 14:42:21.402339 str-msn4600c-acs-02 NOTICE acms#root: Waiting for bootstrap cert
Aug  9 14:42:25.076590 str-msn4600c-acs-02 WARNING pmon#pcied: PCIe Device: Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Not Found
Aug  9 14:42:25.084371 str-msn4600c-acs-02 ERR pmon#pcied: PCIe device status check : FAILED
Aug  9 14:43:21.408393 str-msn4600c-acs-02 NOTICE acms#root: Waiting for bootstrap cert
Aug  9 14:43:25.165666 str-msn4600c-acs-02 WARNING pmon#pcied: PCIe Device: Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Not Found
Aug  9 14:43:25.174626 str-msn4600c-acs-02 ERR pmon#pcied: PCIe device status check : FAILED
root@str-msn4600c-acs-02:/var/log# show platform pcieinfo 
==============================Display PCIe Device===============================
bus:dev.fn 00:00.0 - dev_id=0x6f00, Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 03)
bus:dev.fn 00:01.0 - dev_id=0x6f02, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 03)
bus:dev.fn 00:01.1 - dev_id=0x6f03, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 03)
bus:dev.fn 00:02.0 - dev_id=0x6f04, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 03)
bus:dev.fn 00:02.2 - dev_id=0x6f06, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 03)
bus:dev.fn 00:03.0 - dev_id=0x6f08, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 03)
bus:dev.fn 00:03.1 - dev_id=0x6f09, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 03)
bus:dev.fn 00:03.2 - dev_id=0x6f0a, PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 03)
bus:dev.fn 00:05.0 - dev_id=0x6f28, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 03)
bus:dev.fn 00:05.1 - dev_id=0x6f29, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 03)
bus:dev.fn 00:05.2 - dev_id=0x6f2a, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 03)
bus:dev.fn 00:05.4 - dev_id=0x6f2c, PIC: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC (rev 03)
bus:dev.fn 00:14.0 - dev_id=0x8c31, USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
bus:dev.fn 00:1c.0 - dev_id=0x8c10, PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
bus:dev.fn 00:1c.7 - dev_id=0x8c1e, PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #8 (rev d5)
bus:dev.fn 00:1d.0 - dev_id=0x8c26, USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
bus:dev.fn 00:1f.0 - dev_id=0x8c54, ISA bridge: Intel Corporation C224 Series Chipset Family Server Standard SKU LPC Controller (rev 05)
bus:dev.fn 00:1f.2 - dev_id=0x8c02, SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
bus:dev.fn 00:1f.3 - dev_id=0x8c22, SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
bus:dev.fn 03:00.0 - dev_id=0x6f50, System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0
bus:dev.fn 03:00.1 - dev_id=0x6f51, System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 1
bus:dev.fn 03:00.2 - dev_id=0x6f52, System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 2
bus:dev.fn 03:00.3 - dev_id=0x6f53, System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 3
bus:dev.fn 07:00.0 - dev_id=0xcf70, Ethernet controller: Mellanox Technologies Spectrum-3
bus:dev.fn 09:00.0 - dev_id=0x1533, Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
bus:dev.fn ff:0b.0 - dev_id=0x6f81, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 03)
bus:dev.fn ff:0b.1 - dev_id=0x6f36, Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 03)
bus:dev.fn ff:0b.2 - dev_id=0x6f37, Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 03)
bus:dev.fn ff:0b.3 - dev_id=0x6f76, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link Debug (rev 03)
bus:dev.fn ff:0c.0 - dev_id=0x6fe0, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0c.1 - dev_id=0x6fe1, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0c.2 - dev_id=0x6fe2, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0c.3 - dev_id=0x6fe3, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0f.0 - dev_id=0x6ff8, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0f.4 - dev_id=0x6ffc, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0f.5 - dev_id=0x6ffd, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:0f.6 - dev_id=0x6ffe, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 03)
bus:dev.fn ff:10.0 - dev_id=0x6f1d, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 03)
bus:dev.fn ff:10.1 - dev_id=0x6f34, Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 03)
bus:dev.fn ff:10.5 - dev_id=0x6f1e, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 03)
bus:dev.fn ff:10.6 - dev_id=0x6f7d, Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 03)
bus:dev.fn ff:10.7 - dev_id=0x6f1f, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 03)
bus:dev.fn ff:12.0 - dev_id=0x6fa0, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 03)
bus:dev.fn ff:12.1 - dev_id=0x6f30, Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 03)
bus:dev.fn ff:13.0 - dev_id=0x6fa8, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 03)
bus:dev.fn ff:13.1 - dev_id=0x6f71, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 03)
bus:dev.fn ff:13.2 - dev_id=0x6faa, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 03)
bus:dev.fn ff:13.3 - dev_id=0x6fab, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 03)
bus:dev.fn ff:13.4 - dev_id=0x6fac, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 03)
bus:dev.fn ff:13.5 - dev_id=0x6fad, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 03)
bus:dev.fn ff:13.6 - dev_id=0x6fae, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Broadcast (rev 03)
bus:dev.fn ff:13.7 - dev_id=0x6faf, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast (rev 03)
bus:dev.fn ff:14.0 - dev_id=0x6fb0, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 0 Thermal Control (rev 03)
bus:dev.fn ff:14.1 - dev_id=0x6fb1, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 1 Thermal Control (rev 03)
bus:dev.fn ff:14.2 - dev_id=0x6fb2, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 0 Error (rev 03)
bus:dev.fn ff:14.3 - dev_id=0x6fb3, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 1 Error (rev 03)
bus:dev.fn ff:14.4 - dev_id=0x6fbc, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface (rev 03)
bus:dev.fn ff:14.5 - dev_id=0x6fbd, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface (rev 03)
bus:dev.fn ff:14.6 - dev_id=0x6fbe, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface (rev 03)
bus:dev.fn ff:14.7 - dev_id=0x6fbf, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Interface (rev 03)
bus:dev.fn ff:15.0 - dev_id=0x6fb4, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 2 Thermal Control (rev 03)
bus:dev.fn ff:15.1 - dev_id=0x6fb5, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Thermal Control (rev 03)
bus:dev.fn ff:15.2 - dev_id=0x6fb6, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 2 Error (rev 03)
bus:dev.fn ff:15.3 - dev_id=0x6fb7, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel 3 Error (rev 03)
bus:dev.fn ff:1e.0 - dev_id=0x6f98, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1e.1 - dev_id=0x6f99, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1e.2 - dev_id=0x6f9a, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1e.3 - dev_id=0x6fc0, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1e.4 - dev_id=0x6f9c, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1f.0 - dev_id=0x6f88, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)
bus:dev.fn ff:1f.2 - dev_id=0x6f8a, System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 03)

Steps to reproduce the issue:

  1. The logs are flooding every minute

Describe the results you received:

Describe the results you expected:

Output of show version:

SONiC Software Version: SONiC.20201231.11
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 772c9bb4f3
Build date: Thu Jul 29 12:52:42 UTC 2021
Built by: AzDevOps@sonic-int-build-workers-0002E6

Platform: x86_64-mlnx_msn4600c-r0
HwSKU: ACS-MSN4600C
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2023X22076
Uptime: 00:55:12 up 10:40,  1 user,  load average: 1.66, 1.50, 1.13

Docker images:
REPOSITORY                 TAG                 IMAGE ID            SIZE
docker-syncd-mlnx          20201231.11         e338a5362178        951MB
docker-syncd-mlnx          latest              e338a5362178        951MB
docker-teamd               20201231.11         a4bf2dcadeae        412MB
docker-teamd               latest              a4bf2dcadeae        412MB
docker-router-advertiser   20201231.11         64727df8d988        401MB
docker-router-advertiser   latest              64727df8d988        401MB
docker-platform-monitor    20201231.11         5a196a51d970        722MB
docker-platform-monitor    latest              5a196a51d970        722MB
docker-snmp                20201231.11         a621560e933b        443MB
docker-snmp                latest              a621560e933b        443MB
docker-dhcp-relay          20201231.11         94fece86e389        408MB
docker-dhcp-relay          latest              94fece86e389        408MB
docker-database            20201231.11         537e76ccb179        401MB
docker-database            latest              537e76ccb179        401MB
docker-lldp                20201231.11         131db39b940f        441MB
docker-lldp                latest              131db39b940f        441MB
docker-orchagent           20201231.11         8e95a9ac267c        430MB
docker-orchagent           latest              8e95a9ac267c        430MB
docker-sonic-telemetry     20201231.11         3b6da40a382e        491MB
docker-sonic-telemetry     latest              3b6da40a382e        491MB
docker-mux                 20201231.11         fe41982b0a54        454MB
docker-mux                 latest              fe41982b0a54        454MB
docker-fpm-frr             20201231.11         1012d69d16ae        430MB
docker-fpm-frr             latest              1012d69d16ae        430MB
docker-sonic-restapi       20201231.11         6f4a9cc73192        352MB
docker-sonic-restapi       latest              6f4a9cc73192        352MB
docker-acms                20201231.11         886e5c996c45        197MB
docker-acms                latest              886e5c996c45        197MB
k8s.gcr.io/pause           3.4.1               0f8457a4c2ec        683kB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@bingwang-ms bingwang-ms changed the title [pmon] [pmon] pcied keeps flooding WARNING and ERROR logs Aug 10, 2021
@bingwang-ms
Copy link
Contributor Author

@sujinmkang Are you working on this issue?

@sujinmkang
Copy link
Collaborator

@bingwang-ms This error happens because MLNX4600 needs the different mlnx pcie.yaml file for the different BIOS version on the device. All changes are already in even with the last change (#8309) from MLNX. It's included in 202012 branch 5 days ago.

@bingwang-ms
Copy link
Contributor Author

Closed in #8309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants