Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observed CPU lock up after reload in AS7716-32X #14725

Closed
MahendranPackiam opened this issue Apr 19, 2023 · 2 comments
Closed

Observed CPU lock up after reload in AS7716-32X #14725

MahendranPackiam opened this issue Apr 19, 2023 · 2 comments
Labels
ACCTON Triaged this issue has been triaged

Comments

@MahendranPackiam
Copy link

Description
Observed CPU lock up after reload in AS7716-32X

Steps to reproduce the issue:

  1. config reload

Describe the results you received:
The below traces seen with reload

timed out waiting for input: auto-logout64-accton_as7716_32x-r0/Accton-AS7716-32X$
[ 1813.383693] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1813.390313] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=2328
[ 1813.401806] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
[ 1876.403691] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1876.410310] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=9356
[ 1876.421824] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[ 1876.421880] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 1939.423688] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1939.430306] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=16859
[ 2002.443684] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2002.450303] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=24387
[ 2065.463684] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2065.470304] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=31905
[ 2128.483655] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2128.490276] rcu: 2-...0: (0 ticks this GP) idle=bb6/1/0x4000000000000000 softirq=74802/74802 fqs=39375
[ 2128.501899] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[ 2184.231704] INFO: task khugepaged:67 blocked for more than 311 seconds.
[ 2184.239105] Tainted: G OE 5.10.0-12-2-amd64 sonic-net/sonic-mgmt#1 Debian 5.10.103-1
[ 2184.247774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2184.256533] task:khugepaged state:D stack: 0 pid: 67 ppid: 2 flags:0x00004000
[ 2184.265876] Call Trace:
[ 2184.268615] __schedule+0x282/0x870
[ 2184.272519] ? usleep_range+0x80/0x80
[ 2184.276616] schedule+0x46/0xb0
[ 2184.280129] schedule_timeout+0xff/0x140
[ 2184.284517] ? __prepare_to_swait+0x4b/0x70
[ 2184.289195] __wait_for_common+0xae/0x160
[ 2184.293683] flush_work+0x5c/0x80
[ 2184.297391] ? init_pwq+0xc0/0xc0
[ 2184.301102] lru_add_drain_all+0x155/0x1b0
[ 2184.305691] khugepaged+0x70/0x24f0
[ 2184.309588] ? add_wait_queue_exclusive+0x70/0x70
[ 2184.314850] ? collapse_pte_mapped_thp+0x3e0/0x3e0
[ 2184.320216] kthread+0x11b/0x140
[ 2184.323816] ? __kthread_bind_mask+0x60/0x60
[ 2184.328585] ret_from_fork+0x22/0x30
[ 2184.332647] Kernel panic - not syncing: hung_task: blocked tasks
[ 2184.339358] CPU: 5 PID: 62 Comm: khungtaskd Tainted: G OE 5.10.0-12-2-amd64 sonic-net/sonic-mgmt#1 Debian 5.10.103-1
[ 2184.350535] Hardware name: Accton AS7716-32X/AS7716-32X, BIOS 5.11 10/26/2016
[ 2184.358505] Call Trace:
[ 2184.361236] dump_stack+0x6b/0x83
[ 2184.364936] panic+0x101/0x2d7
[ 2184.368345] watchdog.cold+0xc/0xbc
[ 2184.372240] ? hungtask_pm_notify+0x40/0x40
[ 2184.376910] kthread+0x11b/0x140
[ 2184.380510] ? __kthread_bind_mask+0x60/0x60
[ 2184.385269] ret_from_fork+0x22/0x30
[ 2185.427605] Shutting down cpus with NMI
[ 2185.431889] Kernel Offset: 0x16800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2185.443943] Rebooting in 10 seconds..
[ 2195.369356] ACPI MEMORY or I/O RESET_REG.

Describe the results you expected:
CPU lock should not seen

Additional information you deem important:

Output of show version:
SONIC_VERSION=202205.0-dirty-20230323.115915

Attach debug file sudo generate_dump:

@yxieca yxieca transferred this issue from sonic-net/sonic-mgmt Apr 19, 2023
@neethajohn neethajohn added Triaged this issue has been triaged ACCTON labels Apr 26, 2023
@neethajohn
Copy link
Contributor

@jostar-yang please help take a look

@MahendranPackiam
Copy link
Author

Reported issue not seen after CPLD upgrade of AS7716-32x. We can close this issue . Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACCTON Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

2 participants