-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG][CML] Kernel NULL pointer deference in module reload stress test #1386
Comments
similar issue exist on topic/sof-dev + master: invalid sof component type for removal[ 131.298905] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x40020000: GLB_PM_MSG: CTX_RESTORE |
I am afraid you will have bisect and provide more information on when this last worked, if ever, and more details on where the issue happens. The logs that you provide are just not detailed enough. |
when doing module unload/reload stress test(PA off), it sometimes stuck at Environment Dmesg
add test scripts here: sof_bootloop.txt |
@dengyangchao did you check the dmesg? It looks like a really bad error, not just some delay in removing the module. [ 96.512107] RIP: 0010:snd_ctl_remove+0xf8/0x110 [snd] |
just run Dell CML laptop |
@plbossart run 100 times module unload/reload on CML Chrome today, it can remove and insert successfully. when doing module unload/reload also observed this message in other platforms ICL RVP HDA, CFL-S RVP HDA, GLK Chrome. I don't know what the message mean, is it really a error? if yes, i'd like to open another bug to track Dmesg
|
yes I saw this "RIP: 0010:snd_ctl_remove+0xf8/0x110 [snd]" error on WHL w/ HDA as well added #1424 to track this |
can't reproduce, close , and related issue track on #1424 |
Describe the bug
"BUG: kernel NULL pointer dereference" will be seen when run module reload stress test. sometimes happens at removing
snd_soc_max98357a
, and sometimes happens at removingsnd_soc_rt5682
. dmesg log may not always be caught, because it just blocked there, and dmesg shows no error.To Reproduce
sof_bootloop.sh
Environment
Kernel: topic/sof-dev : 5d67c4b
Firmware: cml-008-drop-stable: sof-cnl.ri providec by Poland team
Topology: file: self compiled sof-cml-rt5682-max98357a.tplg from cml-008-drop-stable
platform: CML Chrome
frequency
very high probability in first ten removal.
kernel NULL pointer deference in dmesg
[ 260.432776] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 260.432780] #PF: supervisor write access in kernel mode
[ 260.432782] #PF: error_code(0x0002) - not-present page
[ 260.432783] PGD 0 P4D 0
[ 260.432786] Oops: 0002 [#3] SMP NOPTI
[ 260.432789] CPU: 5 PID: 3191 Comm: rmmod Tainted: G D 5.4.0-rc3+ #30
[ 260.432791] Hardware name: Google Hatch/Hatch, BIOS 07/05/2019
[ 260.432796] RIP: 0010:mutex_lock+0x14/0x30
[ 260.432799] Code: 1f 80 00 00 00 00 be 02 00 00 00 e9 36 fb ff ff 66 0f 1f 44 00 00 53 48 89 fb e8 97 e8 ff ff 65 48 8b 14 25 00 5d 01 00 31 c0 48 0f b1 13 75 02 5b c3 48 89 df 5b eb cd 0f 1f 00 66 2e 0f 1f
[ 260.432800] RSP: 0018:ffffaedac254fde8 EFLAGS: 00010246
[ 260.432802] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000
[ 260.432804] RDX: ffff9fa312d3d400 RSI: 0000000000000000 RDI: 0000000000000008
[ 260.432805] RBP: 0000000000000000 R08: ffff9fa314c00000 R09: ffff9fa314c00058
[ 260.432806] R10: 0000000000000000 R11: ffffffffbd443bd8 R12: ffffffffc0b29000
[ 260.432807] R13: ffffffffc0b29000 R14: 0000000000000000 R15: 0000000000000000
[ 260.432809] FS: 00007fa8d3216540(0000) GS:ffff9fa316340000(0000) knlGS:0000000000000000
[ 260.432811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 260.432812] CR2: 0000000000000008 CR3: 0000000202840005 CR4: 00000000003606e0
[ 260.432813] Call Trace:
[ 260.432820] snd_sof_ipc_free+0x15/0x30 [snd_sof]
[ 260.432824] snd_sof_device_remove+0x29/0x90 [snd_sof]
[ 260.432827] sof_pci_remove+0x10/0x30 [snd_sof_pci]
[ 260.432830] pci_device_remove+0x36/0xb0
[ 260.432834] device_release_driver_internal+0xe0/0x1c0
[ 260.432837] driver_detach+0x3a/0x80
[ 260.432839] bus_remove_driver+0x53/0xd0
[ 260.432843] pci_unregister_driver+0x1d/0x90
[ 260.432847] __x64_sys_delete_module+0x155/0x240
[ 260.432850] do_syscall_64+0x43/0x120
[ 260.432853] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 260.432856] RIP: 0033:0x7fa8d2d371b7
[ 260.432858] Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
[ 260.432859] RSP: 002b:00007fff33ffc9e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 260.432862] RAX: ffffffffffffffda RBX: 00007fff33ffca48 RCX: 00007fa8d2d371b7
[ 260.432863] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055e3176417d8
[ 260.432864] RBP: 000055e317641770 R08: 00007fff33ffb961 R09: 0000000000000000
[ 260.432865] R10: 00007fa8d2db3cc0 R11: 0000000000000206 R12: 00007fff33ffcc10
[ 260.432867] R13: 00007fff33ffe792 R14: 000055e317641260 R15: 000055e317641770
[ 260.432869] Modules linked in: snd_sof_pci(-) snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_acpi snd_sof_intel_byt snd_soc_acpi_intel_match snd_sof_intel_bdw snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_soc_acpi snd_soc_max98090 snd_soc_max98357a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_rt5682 snd_soc_rt5677_spi snd_soc_rt5670 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rl6231 snd_soc_rt298 snd_soc_rt286 snd_soc_rl6347a snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_pcm asix usbnet snd_intel_nhlt snd_seq_midi snd_seq_midi_event snd_rawmidi i915 x86_pkg_temp_thermal snd_seq intel_powerclamp snd_seq_device snd_timer iwlmvm i2c_algo_bit drm_kms_helper elan_i2c syscopyarea sysfillrect sysimgblt snd fb_sys_fops int3403_thermal drm int3400_thermal soundcore processor_thermal_device int340x_thermal_zone acpi_thermal_rel intel_soc_dts_iosf intel_lpss_pci iwlwifi mei_me
[ 260.432887] intel_lpss mfd_core mei efivarfs sdhci_pci xhci_pci cqhci sdhci xhci_hcd [last unloaded: snd_pcm]
[ 260.432893] CR2: 0000000000000008
[ 260.432896] ---[ end trace 08ee176aed8dc45e ]---
[ 260.432900] RIP: 0010:trace_event_raw_event_hw_interval_param+0x150/0x1a0 [snd_pcm]
[ 260.432903] Code: 01 89 50 44 41 0f b6 54 24 08 d0 ea 83 e2 01 89 50 48 41 0f b6 54 24 08 c0 ea 02 83 e2 01 89 50 4c 41 0f b6 54 24 08 c0 ea 03 <83> e2 01 89 50 50 e8 a5 fe ab fb e9 08 ff ff ff 31 d2 31 f6 e8 b7
[ 260.432904] RSP: 0018:ffffaedac0627e98 EFLAGS: 00010286
[ 260.432906] RAX: ffffffffc0867060 RBX: ffff9fa312af3898 RCX: ffff9fa3162e72e0
[ 260.432907] RDX: ffff9fa3162e72e0 RSI: 00000000000000c0 RDI: ffff9fa312af3898
[ 260.432909] RBP: ffff9fa3162e72c0 R08: 0000746e65696369 R09: 8080808080808080
[ 260.432910] R10: 0000000000000018 R11: fefefefefefefeff R12: ffff9fa3162eb500
[ 260.432911] R13: 0000000000000000 R14: ffff9fa3149f20c0 R15: 0ffff9fa3162eb50
[ 260.432913] FS: 00007fa8d3216540(0000) GS:ffff9fa316340000(0000) knlGS:0000000000000000
[ 260.432914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 260.432916] CR2: 0000000000000008 CR3: 0000000202840005 CR4: 00000000003606e0
The text was updated successfully, but these errors were encountered: