Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] audio process died after suspend-resume with audio #3999

Closed
miRoox opened this issue Nov 8, 2022 · 13 comments
Closed

[BUG] audio process died after suspend-resume with audio #3999

miRoox opened this issue Nov 8, 2022 · 13 comments
Labels
ADL Applies to Alder Lake platform bug Something isn't working Intel Daily tests This issue can be found in Intel internal daily tests P1 Blocker bugs or important features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake) TGL Applies to Tiger Lake platform

Comments

@miRoox
Copy link

miRoox commented Nov 8, 2022

Describe the bug
CI observed the capture process prematurely died after suspend-resume with capture on TGLU_VOLT_SDW_ZEPHYR.

To Reproduce

TPLG=/lib/firmware/intel/sof-tplg/sof-tgl-sdw-max98373-rt5682.tplg MODEL=TGLU_VOLT_SDW_ZEPHYR ~/sof-test/test-case/check-suspend-resume-with-audio.sh -l 5 -m capture

Reproduction Rate
twice in CI, hard to reproduce manually.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: sof-tgl-sdw-max98373-rt5682.tplg
  3. Name of the platform(s) on which the bug is observed.
    • Platform: TGLU_VOLT_SDW_ZEPHYR

Screenshots or console output

2022-11-06 23:51:20 UTC Sub-Test: [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Sun Nov  6 23:51:26 2022
2022-11-06 23:51:26 UTC Sub-Test: [REMOTE_COMMAND] sleep for 5
2022-11-06 23:51:31 UTC Sub-Test: [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1667778679"
2022-11-06 23:51:32 UTC Sub-Test: [REMOTE_INFO] Check for wakeup_count and suspend_stats

/sys/power/suspend_stats/fail:0
/sys/power/suspend_stats/failed_freeze:0
/sys/power/suspend_stats/failed_prepare:0
/sys/power/suspend_stats/failed_resume:0
/sys/power/suspend_stats/failed_resume_early:0
/sys/power/suspend_stats/failed_resume_noirq:0
/sys/power/suspend_stats/failed_suspend:0
/sys/power/suspend_stats/failed_suspend_late:0
/sys/power/suspend_stats/failed_suspend_noirq:0
/sys/power/suspend_stats/last_failed_dev:
/sys/power/suspend_stats/last_failed_errno:0
/sys/power/suspend_stats/last_failed_step:
/sys/power/suspend_stats/success:115

name		active_count	event_count	wakeup_count	expire_count	active_since	total_time	max_time	last_change	prevent_suspend_time
--- /home/ubuntu/sof-test/logs/check-suspend-resume-with-audio/2022-11-06-23:48:23-21536/initial_wakeup_sources.txt	2022-11-06 23:50:32.113555022 +0000
+++ /sys/kernel/debug/wakeup_sources	2022-11-06 21:29:35.016000000 +0000
@@ -14,7 +14,7 @@
 0000:00:14.0	0		0		0		0		0		0		0		0		0
 0000:00:0d.0	0		0		0		0		0		0		0		0		0
 alarmtimer.0.auto	0		0		0		0		0		0		0		0		0
-00:03       	117		220		0		0		0		6		0		7866726		0
+00:03       	124		230		0		0		0		7		0		7904405		0
 serio0      	0		0		0		0		0		0		0		0		0
 PNP0C0A:00  	0		0		0		0		0		0		0		0		0
 AC          	1		1		0		0		0		0		0		774		0

2022-11-06 23:51:32 UTC Sub-Test: [REMOTE_WARNING] /sys/power/wakeup_count is 125, expected 124

declare -- cmd="journalctl_cmd --since=@1667778679"
2022-11-06 23:51:32 UTC Sub-Test: [REMOTE_INFO] Test Result: PASS!
declare -- cmd="journalctl_cmd --since=@1667778628"
process: 126897, status: not found
2022-11-06 23:51:33 UTC [REMOTE_ERROR] process status is abnormal
2022-11-06 23:51:33 UTC [REMOTE_INFO] dump ps for aplay & arecord
ubuntu    127438  125737  0 23:51 ?        00:00:00 grep -E aplay|arecord
2022-11-06 23:51:33 UTC [REMOTE_INFO] dump ps for child process
UID          PID    PPID  C STIME TTY          TIME CMD
ubuntu    125804  125737  0 23:48 ?        00:00:00 /bin/bash /home/ubuntu/sof-test/test-case/check-suspend-resume-with-audio.sh -l 5 -m capture
ubuntu    127440  125737  0 23:51 ?        00:00:00 ps --ppid 125737 -f
2022-11-06 23:51:33 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2022-11-06 23:51:33 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-suspend-resume-with-audio.sh

console_log.txt
dmesg.txt
slogger.txt

@miRoox miRoox added bug Something isn't working TGL Applies to Tiger Lake platform suspend resume Issues related to suspend resume (e.g. rtcwake) labels Nov 8, 2022
@plbossart
Copy link
Member

IPC3 + Zephyr issue, this is going to be difficult to debug given the focus on IPC4, even more so if it's not reproducible manually.

@juimonen
Copy link

We seem to have similar issue with ipc4+zephyr+native_drivers in tgl. Like dw dma would not start after wake up. This can be regression in sof or zephyr. @miRoox is this something that has been working before with this setup?

@keqiaozhang keqiaozhang added the P2 Critical bugs or normal features label Nov 14, 2022
@keqiaozhang
Copy link
Collaborator

Observed a similar issue on ADLP_RVP_SDW_ZEPHYR when testing suspend-resume with playback. Test ID:17914.
Console output:

2022-11-20 23:47:51 UTC Sub-Test: [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: wakeup from "mem" using /dev/rtc0 at Sun Nov 20 23:47:57 2022
2022-11-20 23:47:58 UTC Sub-Test: [REMOTE_COMMAND] sleep for 5
aplay: pcm_write:2127: write error: Input/output error
2022-11-20 23:48:03 UTC Sub-Test: [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1668988070"
2022-11-20 23:48:04 UTC Sub-Test: [REMOTE_INFO] Check for wakeup_count and suspend_stats
declare -- cmd="journalctl_cmd --since=@1668988070"
2022-11-20 23:48:04 UTC Sub-Test: [REMOTE_INFO] Test Result: PASS!
declare -- cmd="journalctl_cmd --since=@1668988020"
process: 130430, status: not found
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/125/gvfs
      Output information may be incomplete.
2022-11-20 23:48:05 UTC [REMOTE_ERROR] process status is abnormal
2022-11-20 23:48:05 UTC [REMOTE_INFO] dump ps for aplay & arecord
ubuntu    131108  129667  0 23:48 ?        00:00:00 grep -E aplay|arecord
2022-11-20 23:48:05 UTC [REMOTE_INFO] dump ps for child process
UID          PID    PPID  C STIME TTY          TIME CMD
ubuntu    129733  129667  0 23:45 ?        00:00:00 /bin/bash /home/ubuntu/sof-test/test-case/check-suspend-resume-with-audio.sh -l 5 -m playback
ubuntu    131110  129667  0 23:48 ?        00:00:00 ps --ppid 129667 -f
2022-11-20 23:48:05 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2022-11-20 23:48:05 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-suspend-resume-with-audio.sh

@keqiaozhang keqiaozhang changed the title [BUG] capture process died after suspend-resume with capture on TGLU_VOLT_SDW_ZEPHYR [BUG] audio process died after suspend-resume with audio Nov 24, 2022
@keqiaozhang keqiaozhang added the won't fix This will not be worked on atm (e.g. a bug closed for lack of user request, hardware etc) label Dec 19, 2022
@keqiaozhang
Copy link
Collaborator

We have aligned to stop testing IPC3 on sof-main branch.

@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Dec 22, 2022

@fredoh9
Copy link
Collaborator

fredoh9 commented Jan 25, 2023

found same issue on ADLP_BRYA_SDW from daily test with stable-v2.2

planresultdetail/20167?model=ADLP_BRYA_SDW&testcase=check-suspend-resume-with-playback-5

@fredoh9 fredoh9 added the ADL Applies to Alder Lake platform label Jan 25, 2023
@fredoh9
Copy link
Collaborator

fredoh9 commented Feb 4, 2023

Now this is 100% failure on ADLP_BRYA_SDW with stable-v2.2

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 6, 2023

Good news: the best way to find and eventually fix race conditions is to make them "worst" = reproducible first.

EDIT: labels updated based on more recent and much wider reproductions.

@marc-hb marc-hb added Intel Daily tests This issue can be found in Intel internal daily tests P1 Blocker bugs or important features and removed won't fix This will not be worked on atm (e.g. a bug closed for lack of user request, hardware etc) P2 Critical bugs or normal features labels Feb 9, 2023
@plbossart
Copy link
Member

plbossart commented Feb 22, 2023

@bardliao noted that this issue happens on ADLP_BRYA_SDW with GCC 10.3.0 used
https://sof-ci.ostc.intel.com/#/result/planresultdetail/20167?model=ADLP_BRYA_SDW&testcase=verify-kernel-boot-log

@ujfalusi @marc-hb @fredoh9 another problem with this compiler, similar to #4203

Not an isolated incident really. What the compiler does to suspend-resume is not clear to me.

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 23, 2023

Pretty much every race condition I ever looked at is affected by compiler changes.

@bardliao
Copy link
Collaborator

Pretty much every race condition I ever looked at is affected by compiler changes.

Yes, it could be a race condition.

@plbossart
Copy link
Member

@fredoh9 do we still see this issue with ADLP_BRYA_SDW? I must admit not tracking this issue or forgetting about it.

@plbossart plbossart added the SDW Applies to SoundWire bus for codec connection label Apr 17, 2023
@plbossart
Copy link
Member

closing since it's almost been a year without any updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADL Applies to Alder Lake platform bug Something isn't working Intel Daily tests This issue can be found in Intel internal daily tests P1 Blocker bugs or important features SDW Applies to SoundWire bus for codec connection suspend resume Issues related to suspend resume (e.g. rtcwake) TGL Applies to Tiger Lake platform
Projects
None yet
Development

No branches or pull requests

7 participants