Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ZEPHYR FATAL ERROR 0: CPU exception on CPU 0 on TGL-IPC4-SDW #7324

Closed
keqiaozhang opened this issue Mar 23, 2023 · 8 comments
Closed
Labels
bug Something isn't working as expected Intel Linux Daily tests This issue can be found in internal Linux daily tests IPC4 Issues observed with IPC4 (same IPC as Windows) P2 Critical bugs or normal features TGL Applies to Tiger Lake Zephyr Issues only observed with Zephyr integrated
Milestone

Comments

@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Mar 23, 2023

Describe the bug
Observed this fatal error in daily test. This issue happened on TGLU-RVP-IPC4-SDW when testing multiple-pipeline-playback-50. The reproduction rate is about 20%.

Daily test run 22506?model=TGLU_RVP_SDW_IPC4ZPH&testcase=multiple-pipeline-playback-50
dmesg

[ 2847.147303] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x13050004|0x0: GLB_SET_PIPELINE_STATE
[ 2847.148926] kernel: snd_sof_intel_hda_common:hda_dai_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: cmd=1 dai iDisp3 Pin direction 0
[ 2847.148937] kernel: snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc4 set pipeline 7 state 3
[ 2847.654695] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc timed out for 0x13050004|0x0
[ 2847.654703] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: preventing DSP entering D3 state to preserve context
[ 2847.654705] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 2847.654741] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: hda irq intsts 0x00000000 intlctl 0xc0000280 rirb 00
[ 2847.654743] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: dsp irq ppsts 0x00000000 adspis 0x00000000
[ 2847.654790] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: Host IPC initiator: 0x93050004|0x0|0x0, target: 0x0|0x0|0x0, ctl: 0x3
[ 2847.654793] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 2847.654795] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 2847.654796] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: IPC timeout
[ 2847.654798] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 2847.654814] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 2847.654870] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: extended rom status:  0x5 0x0 0x0 0x0 0x0 0x0 0x0 0x1
[ 2847.654872] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 2847.654909] kernel: sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at soc_dai_trigger on iDisp1 Pin: -110
[ 2847.654916] kernel:  HDMI1: ASoC: error at dpcm_be_dai_trigger on HDMI1: -110
[ 2847.654920] kernel:  HDMI1: ASoC: trigger FE cmd: 1 failed: -110

mtrace

[   28.117468] <inf> ipc: comp:7 0x40007 dai_config() dai type = 3 index = 2 dd 0x9e0b96c0
[   28.117490] <inf> pipe: comp:7 0x40007 connect buffer 0 as source
[   28.117898] <inf> ipc: rx	: 0x45060004|0x70004
[   28.117940] <inf> ipc: buffer new size 0x300 id 6.0 flags 0x0
[   28.117970] <inf> pipe: comp:6 0x40006 connect buffer 0 as sink
[   28.117981] <inf> pipe: comp:7 0x40007 connect buffer 0 as source
[   28.123098] <inf> ipc: rx	: 0x13030003|0x0
[   28.123748] <inf> ipc: rx	: 0x13030004|0x0
[   28.123801] <inf> dai_comp: comp:3 0x40003 dai_playback_params() dest_dev = 0 stream_id = 0 src_width = 4 dest_width = 4
[   28.123876] <inf> pipe: pipe:3 0x0 pipe trigger cmd 7
[   28.123890] <inf> ll_schedule: task add 0xbe0ba580 0x20180U priority 0 flags 0x0
[   28.124310] <wrn> dai_comp: comp:3 0x40003 dai_copy(): nothing to copy
[   28.124920] <inf> ipc: rx	: 0x13020003|0x0
[   28.124956] <wrn> ipc: ipc_pipeline_complete(): no scheduling component specified, use comp 262146
[   28.125320] <wrn> dai_comp: comp:3 0x40003 dai_copy(): nothing to copy
[   28.127508] <inf> ipc: rx	: 0x13050004|0x0
[   28.127585] <inf> dai_comp: comp:5 0x40005 dai_playback_params() dest_dev = 0 stream_id = 0 src_width = 4 dest_width = 4
[   28.127660] <inf> pipe: pipe:5 0x0 pipe trigger cmd 7
[   28.127673] <inf> ll_schedule: task add 0xbe0bb580 0x20180U priority 0 flags 0x0
[   28.128386] <err> os:  ** FATAL EXCEPTION
[   28.128405] <err> os:  ** CPU 0 EXCCAUSE 13 (load/store PIF data error)
[   28.128413] <err> os:  **  PC 0xbe0258f3 VADDR (nil)
[   28.128448] <err> os:  **  PS 0x60f20
[   28.128461] <err> os:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:15 CALLINC:2)
[   28.128471] <err> os:  **  A0 0xbe0442ef  SP 0xbe098410  A2 0xbe0b71c0  A3 (nil)
[   28.128481] <err> os:  **  A4 (nil)  A5 0xbe0b5380  A6 0xc0  A7 0xbe0b5380
[   28.128491] <err> os:  **  A8 0x180  A9 0x300 A10 0x300 A11 0xbe0b5680
[   28.128501] <err> os:  ** A12 0xc0 A13 0x8 A14 0x1 A15 0xbe098600
[   28.128511] <err> os:  ** LBEG 0xbe056369 LEND 0xbe05636f LCOUNT (nil)
[   28.128520] <err> os:  ** SAR 0x20




Backtrace:0xbe0258f0:0xbe098410 0xbe0442ec:0xbe098450 0xbe044384:0xbe098470 0xbe025b03:0xbe0984c0 0xbe027221:0xbe0984e0 0xbe027277:0xbe098520 0xbe026d77:0xbe0985a0 0xbe020eeb:0xbe0985f0 0xbe027ead:0xbe098630 0xbe013ac2:0xbe098670 



[   28.128638] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[   28.128646] <err> os: Current thread: 0x9e0abb78 (unknown)
[   28.130678] <err> os: Halting systemTerminated

To Reproduce
~/sof-test/test-case/multiple-pipeline.sh -f p -l 50

Reproduction Rate
20%.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    Kernel Commit: 7abb3259b5f0
    SOF Commit: ac910714f40a
    Zephyr Commit: zephyr-v3.3.0-1481-ge40859f78712
  2. Name of the topology file
    • Topology: avs-tplg/cavs-sdw.tplg
  3. Name of the platform(s) on which the bug is observed.
    • Platform: TGLU-IPC4-SDW

dmesg.txt
mtrace.txt

@keqiaozhang keqiaozhang added bug Something isn't working as expected Zephyr Issues only observed with Zephyr integrated TGL Applies to Tiger Lake IPC4 Issues observed with IPC4 (same IPC as Windows) labels Mar 23, 2023
@lyakh
Copy link
Collaborator

lyakh commented Mar 23, 2023

@keqiaozhang is the respective firmware ELF file still available for this bug? If yes - would it be possible to attach it to this bug report?

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 23, 2023

I'm afraid the firmware ELF file is not kept yet. However I found the corresponding build log (jenkins sof_generic_build/22704/console) and I could reproduce the same zephyr.lst checksum, so you should be able to:

sofbld12$ cd sof
git checkout ac910714f40a
west update

./scripts/xtensa-build-zephyr.py -j100 -p -i IPC4 -d tgl
14:20:26 |-- sof 
14:20:26 |   |-- tgl 
14:20:26 |   |   +-- sof-tgl.ldc 	sha256=beb6f497ce59fbd199589fd8ec6c7a8ae548dcde4b9b6d0bedcfa450c3e69db0
14:20:26 |-- sof-info 
14:20:26 |   |-- tgl 
14:20:26 |   |   |-- boot.mod.gz 	sha256=e135d32f7e08dd774b76ff40679e669ee34229ac42349231f7c8864079dba052
14:20:26 |   |   |-- config.gz 
14:20:26 |   |   |-- generated_configs.c.gz 	sha256=6c500ef7fb8786b1ce079f795ab94241125e629b8be891b9b1a1984a6b7a0138
14:20:26 |   |   |-- main.mod.gz 
14:20:26 |   |   |-- sof_versions.h 	sha256=182e4390f3c0f0120cca5e509a1a84aad7e399b05bcc1bb7a74dc819d2e37642
14:20:26 |   |   |-- stripped-main.elf.gz 	sha256=996f048bdc70d6027531e7ad1877af81707e96e1510209c0de5f993ab2e87990
14:20:26 |   |   |-- stripped-zephyr.elf.gz 	sha256=0c14e9ed140d997f3d1d6448a3cfc922331b2174b64c714e234fc32089b1d074
14:20:26 |   |   |-- zephyr.elf.gz 
14:20:26 |   |   |-- zephyr.lst.gz 	sha256=737355bf1d017344df484b4bbac02893c814887c09361605dd04a26a80f40296
14:20:26 |   |   |-- zephyr.map.gz 
14:20:26 |   |   +-- zephyr_version.h 	sha256=0332406ca0968edc99eab65a985656483da53a549125bd5adeed428803cb8d24

Make sure your west topdir is long enough because of xt-xcc bug #7114, mine was 46 characters long. Above 27 characters should do.

Despite the same versions and zephyr.lst checksum I still had stripped- differences, will look into that...

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 24, 2023

Despite the same versions and zephyr.lst checksum I still had stripped- differences, will look into that...

Found it: since CONFIG_ASSERT, even the stripped ELF is not reproducible anymore :-(

@kfrydryx kfrydryx added the P2 Critical bugs or normal features label Mar 28, 2023
@lgirdwood lgirdwood added this to the v2.6 milestone Mar 29, 2023
@lgirdwood
Copy link
Member

@keqiaozhang still being reported or can we close ?

@marc-hb marc-hb added the Intel Linux Daily tests This issue can be found in internal Linux daily tests label May 11, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented May 11, 2023

Yes this has still been spotted semi-regularly in daily test results.

@marc-hb
Copy link
Collaborator

marc-hb commented May 25, 2023

@kv2019i any chance this could have been a variant manifestation of heap corruption #7191 that you just submitted a fix for?

@keqiaozhang do you remember seeing this more recently?

@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang do you remember seeing this more recently?

I haven't been observe this issue in CI lately, the popular issue in CI now is #7191 and some variants.
#7660 was merged yesterday and no Zephyr panics happened today.

@keqiaozhang
Copy link
Collaborator Author

This issue cannot be reproduced after #7660 merged. Closing this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected Intel Linux Daily tests This issue can be found in internal Linux daily tests IPC4 Issues observed with IPC4 (same IPC as Windows) P2 Critical bugs or normal features TGL Applies to Tiger Lake Zephyr Issues only observed with Zephyr integrated
Projects
None yet
Development

No branches or pull requests

5 participants