Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][TGL][Chrome] DSP Panic on opening Internal Mic after plugging/unplugging a headset #4648

Closed
lancedkhou opened this issue Aug 18, 2021 · 11 comments · Fixed by #4696
Closed
Assignees
Labels
bug Something isn't working as expected DSP panic DSP panic observed Fix confirmed Fix for the issue has been found and verified IGO NC Observed in IGO NC integration TGL Applies to Tiger Lake

Comments

@lancedkhou
Copy link

Describe the bug
When IGO was enabled and PRs to fix heap usage issue applied.
During recording, there was DSP panic found when plugging/unplugging a headset

To Reproduce

  1. Flash firmware with PR Tgl 013 drop stable hot fix #4575, back-porting DMIC fixes to Tgl 013  #4612 and dw-dma: fix the lli corruption issue #4640 + IGO and disable Google assistant in Settings.
  2. Enable “Enable Input Noise Cancellation UI” flag then restart UI
  3. Join Google meet (output: internal speaker, input: internal mic)
  4. Plug in 3.5mm jack headset (output: headset speaker, input: headset mic)
  5. Switch input to internal mic by UI (output: headset speaker, input: internal mic)
  6. Unplug 3.5mm jack headset (output: headset speaker, input: headset mic)
  7. Repeat step 4~6 few times to check whether input capture is still alive

Reproduction Rate
1/10

Expected behavior
The recording is always valid.

Impact
Blocker

Environment
tgl-13 branch with IGO patches & IGO binary
Issue still reproduced with PR #4575, #4612 and #4640

Screenshots or console output
2021-08-17T11:49:06.230044Z ERR kernel: [ 900.437070] sof-audio-pci 0000:00:1f.3: error : DSP panic!
2021-08-17T11:49:06.230068Z ERR kernel: [ 900.437093] sof-audio-pci 0000:00:1f.3: status: fw entered - code 00000005
2021-08-17T11:49:06.230069Z ERR kernel: [ 900.437203] sof-audio-pci 0000:00:1f.3: error: runtime exception
2021-08-17T11:49:06.230070Z ERR kernel: [ 900.437207] sof-audio-pci 0000:00:1f.3: error: trace point 00004000
2021-08-17T11:49:06.230072Z ERR kernel: [ 900.437210] sof-audio-pci 0000:00:1f.3: error: panic at :0
2021-08-17T11:49:06.230073Z ERR kernel: [ 900.437214] sof-audio-pci 0000:00:1f.3: error: DSP Firmware Oops
2021-08-17T11:49:06.230084Z ERR kernel: [ 900.437217] sof-audio-pci 0000:00:1f.3: error: Exception Cause: LoadProhibitedCause, A load referenced a page mapped with an attribute that does not permit loads
2021-08-17T11:49:06.230087Z ERR kernel: [ 900.437222] sof-audio-pci 0000:00:1f.3: EXCCAUSE 0x0000001c EXCVADDR 0xc0000038 PS 0x00060d25 SAR 0x00000000
2021-08-17T11:49:06.230088Z ERR kernel: [ 900.437225] sof-audio-pci 0000:00:1f.3: EPC1 0xbe02e437 EPC2 0xbe02d7e6 EPC3 0x00000000 EPC4 0x00000000
2021-08-17T11:49:06.230090Z ERR kernel: [ 900.437229] sof-audio-pci 0000:00:1f.3: EPC5 0x00000000 EPC6 0x00000000 EPC7 0x00000000 DEPC 0x00000000
2021-08-17T11:49:06.230091Z ERR kernel: [ 900.437233] sof-audio-pci 0000:00:1f.3: EPS2 0x00060720 EPS3 0x00000000 EPS4 0x00000000 EPS5 0x00000000
2021-08-17T11:49:06.230093Z ERR kernel: [ 900.437236] sof-audio-pci 0000:00:1f.3: EPS6 0x00000000 EPS7 0x00000000 INTENABL 0x00000000 INTERRU 0x00000222

4575+4612+4640-DSP-panic-messages.txt

@lancedkhou lancedkhou added the bug Something isn't working as expected label Aug 18, 2021
@sathya-nujella sathya-nujella added the TGL Applies to Tiger Lake label Aug 18, 2021
@mengdonglin mengdonglin added DSP panic DSP panic observed IGO NC Observed in IGO NC integration Fix confirmed Fix for the issue has been found and verified labels Aug 19, 2021
@bkokoszx bkokoszx self-assigned this Aug 19, 2021
@abonislawski
Copy link
Member

@lancedkhou could please include also linux crash/coredump file? (with fw elf)
@lgirdwood can we make this a standard for dsp panic cases? It would be much easier to run coredump-reader and immediately see backtrace for dsp panic rather than things like "EPC1 0xbe02e437" in logs

@lancedkhou
Copy link
Author

@sathya-nujella
Copy link
Contributor

sathya-nujella commented Aug 20, 2021

Hi,
Here is the script to reproduce the issue via command line:

  1. Connect headset to DUT

  2. run the below script and monitor "dmesg -w | grep sof " .. we can see "DSP panic" prints

==
#!/bin/bash
stop ui

for i in {1..50}
do
cras_test_client --playback_f /dev/zero --select_output 6:0 --duration_seconds 10 &
cras_test_client --capture_f /tmp/a.wav --select_input 9:0 --duration_seconds 10
sleep 5

cras_test_client --playback_f /dev/zero --select_output 7:0 --duration_seconds 30 &
cras_test_client --capture_f /tmp/a.wav --select_input 10:0 --duration_seconds 10
cras_test_client --capture_f /tmp/a.wav --select_input 9:0 --duration_seconds 15
sleep 10

done

===
PS / Note:
Node ID's in above script are from here:

localhost ~ # cras_test_client
..
Output Devices:
Output Nodes:
Stable Id ID Vol UI Plugged L/R swapped Time Hotword Type MaxCha Name
(62f96cde) 7:0 72 0.000000 yes no 1629499743 HEADPHONE 2 Headphone
(72a57953) 6:0 100 0.000000 yes no 1629499742 INTERNAL_SPEAKER 2*Speaker

Input Devices:
Input Nodes:
Stable Id ID Gain UI Plugged L/R swapped Time Hotword Type MaxCha Name
(61aab6ba) 10:0 20 1.000000 yes no 1629499743 MIC X Mic
(0039c920) 9:0 26 1.000000 yes no 1629499742 INTERNAL_MIC X*Internal Mic

@keyonjie
Copy link
Contributor

@lancedkhou could please include also linux crash/coredump file? (with fw elf)
@lgirdwood can we make this a standard for dsp panic cases? It would be much easier to run coredump-reader and immediately see backtrace for dsp panic rather than things like "EPC1 0xbe02e437" in logs

@abonislawski can you share how to use the coredump tool? e.g. in the documentation https://thesofproject.github.io/latest/developer_guides/debugability/coredump-reader/index.html?highlight=coredump,

-i INFILE
path to sys dump bin

what is the "sys dump bin" here?

@abonislawski
Copy link
Member

@keyonjie simply by ./sof-coredump-to-gdb.sh elf dump_file

what is the "sys dump bin" here?

Linux core dump file with stack, regs dump (see panic.c)

@sathya-nujella
Copy link
Contributor

I reproduced "DSP panic", but exception file size is zero.

localhost /sys/kernel/debug/sof # ls -ls
total 0
0 -r--r--r--. 1 root root 0 Aug 23 22:38 exception

Let me cross check the CONFIG as per this comment: #3965 (comment)

@sathya-nujella
Copy link
Contributor

Hi @bkokoszx - FYI.

Thanks to @yongzhi1 for pointing out this CONFIG change.
I am now able to get sof/exception after enabling the below kernel config:
CONFIG_SND_SOC_SOF_DEBUG=y
CONFIG_SND_SOC_SOF_DEBUG_ENABLE_FIRMWARE_TRACE=y
CONFIG_SND_SOC_SOF_DEVELOPER_SUPPORT=y
CONFIG_SND_SOC_SOF_DEBUG_ENABLE_DEBUGFS_CACHE=y

"DSP panic" is reproduced and attached exception file.
cat /sys/kernel/debug/sof/exception > ~/oops.bin
oops.zip

@paulstelian97
Copy link
Collaborator

I reproduced "DSP panic", but exception file size is zero.

localhost /sys/kernel/debug/sof # ls -ls
total 0
0 -r--r--r--. 1 root root 0 Aug 23 22:38 exception

Do not trust stat here, it will report 0 for files that aren't actually 0-sized.

@lgirdwood
Copy link
Member

@lancedkhou could please include also linux crash/coredump file? (with fw elf)
@lgirdwood can we make this a standard for dsp panic cases? It would be much easier to run coredump-reader and immediately see backtrace for dsp panic rather than things like "EPC1 0xbe02e437" in logs

@abonislawski can you share how to use the coredump tool? e.g. in the documentation https://thesofproject.github.io/latest/developer_guides/debugability/coredump-reader/index.html?highlight=coredump,

-i INFILE
path to sys dump bin

what is the "sys dump bin" here?

I think the goal should be to update the core dump script to parse for the dmesg for the FW oops.

@abonislawski
Copy link
Member

abonislawski commented Aug 30, 2021

Detailed description what happened in this case:

tgl_issue4648

  1. Two interesting buffers from (added) FW logs, both with the same buffer->addr 0xbe17ae80 and different buffer->size (0x180 and 0x6000)
    They are not allocated at the same time so all is ok.

  2. Regs dump from host_get_copy_bytes_normal() during normal operation:
    a) local_buffer addr 0xbe162e40
    b) local_buffer size 0x6000
    c) dma_buffer addr 0xbe17ae80
    d) dma_buffer size 0x6000
    e) calculated copy_bytes 0x3000
    f) latest allocated size for buffer 0xbe17ae80 0x6000

So we have two buffers with size 0x6000, copy_bytes 0x3000, all is ok.

  1. Regs dump from host_get_copy_bytes_normal() during normal operation:
    a) local_buffer addr 0xbe156740
    b) local_buffer size 0x0300
    c) dma_buffer addr 0xbe17ae80
    d) dma_buffer size 0x0180
    e) calculated copy_bytes 0x00cc
    f) latest allocated size for buffer 0xbe17ae80 0x0180

Similar data with second case when buffer 0xbe17ae80 size is 0x180, everything works as expected.

  1. Panic case.
    a) local_buffer addr 0xbe162e40
    b) local_buffer size 0x6000
    c) dma_buffer addr 0xbe17ae80
    d) dma_buffer size 0x0180 !!!
    e) calculated copy_bytes 0x3000 !!!
    f) latest allocated size for buffer 0xbe17ae80 0x6000 !!!

From the buffer pairs a) and c) we can easily see that this should be case from point 2. when we have two buffers with size 0x6000 and calculated copy_bytes 0x3000 but something is wrong and in buffer description data size is 0x180, just like in the previous allocation.
Also in f) we can see that the latest allocation size for buffer 0xbe17ae80 was 0x6000 so we have cache/multicore issue here.

So the bad things starts here and FW will end with dsp panic in several steps, it could be easily stopped if the copy_bytes was limited to smaller buffer (even if fw only thinks he is smaller).
Next steps: dma_buffer_copy_to() -> audio_stream_produce(), here we will update buffer->w_ptr in audio_stream_wrap(buffer, (char *)buffer->w_ptr + bytes) with simple check:

if (ptr >= buffer->end_addr)
		ptr = (char *)buffer->addr +
			((char *)ptr - (char *)buffer->end_addr);

this code is not prepared for case where copy_bytes is bigger than buffer size so in result it will set buffer->w_ptr > end_addr, this is a second place where we could still avoid panic with simple check if the new ptr > end_addr.

In the next iteration fw will crash in dma_buffer_copy_to() -> audio_stream_writeback() because of faulty buffer->w_ptr (head_size 0xffffd200).

From our tests allocating buffer description struct (with w_ptr, size etc.) in SOF_MEM_ZONE_RUNTIME_SHARED should be enough but please make more tests @sathya-nujella (PR #4696 )
It could be also avoided with some writeback/invalidate calls but probably the best way is to just alloc it as SHARED

@lgirdwood
Copy link
Member

@abonislawski good update and info here. Any validation news on whether #4696 will fix this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected DSP panic DSP panic observed Fix confirmed Fix for the issue has been found and verified IGO NC Observed in IGO NC integration TGL Applies to Tiger Lake
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants