-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087
ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looksgood. Thanks!
@kv2019i could you please rebase? |
Downgrade "nothing to do in IRQ thread" message from error to a debug message in the IPC interrupt handler thread. The spurious wake-up can happen if a HDA stream interrupt is raised while the IPC interrupt thread is running. IPC functionality is not impacted by this condition, so debug is a more appropriate trace level. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
6df57f5
to
7ce29e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need one more approval
@keyonjie as explained in the description this happens because the HDA interrupt is raised while we're in the IPC thread and we have not cleared that interrupt just yet. I'm not sure we can avoid this situation. |
@ranj063 if "while we're in the IPC thread", that means the HDA_DSP_ADSPIS_IPC is set, and then HDA_DSP_REG_HIPCIE_DONE or HDA_DSP_REG_HIPCT_BUSY should be 1, how can the HDA stream interrupt impact the IPC registers? |
@keyonjie wrote
This is related to how we chose to fix the IPC stress test timeouts. We had multiple race conditions where a hardware interrupt was raised while CPU was executing the threaded handler, and we missed IPC processing in some corner case. To combat this, the hard handler now wakes the thread whenever the the status bits are set when the hard irq handler is called. In this case, following happens:
I think this is a valid case with our current stack. @lyakh is working on some improvements (#1042), but I don't think these change the behaviour. |
The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set? So ipc_irq_handler() will be called, but "if (irq_status & HDA_DSP_ADSPIS_IPC)" will be false and it will return with IRQ_NONE.
|
@keyonjie ADSPIS is a logical OR of all upstream DSP status bits. As long as the IPC user thread has not completed handling all the events, ADSPIS will remain set. So if IPC user thread is interrupted by a HDA stream hda irq, the ADSPIS will be still set. |
Oh, really? That's bad, that means our check here "if (irq_status & HDA_DSP_ADSPIS_IPC)" is not the sufficient condition to claim that it is a new IPC interrupt. Can we do some change like this to fix it?
We don't want the thread be scheduled multiple times for the same IPC IRQ. |
@keyonjie wrote:
This is a very infrequent event, so the performance penalty is small. We did discuss this when ironing out fixes for the IPC timeouts and the false wake-ups of the thread seemed the more robust solution.
... I believe this opens up the risks for races again. If some DSP status bit is set towards end of IPC irq thread, this bit will not be handled by the thread anymore but ADSPIC still remains disabled (hirq happens before user thread reenables ADSPIC). With your proposed change, we would not wake up the thread in this case and we would miss the IPC leading to IPC timeout. Plus, if we merge PR1042, then this logic fails to work as we stop disabling the ADSPIC. |
@kv2019i so it sounds like we have to choose one from two bottle of poisons. With solution what @lyakh and you provided, sounds like we will schedule more irq thread than it actually should be, but it can help to avoid IPC missing/timeout issues. Some concern about it remained to me:
|
@keyonjie Well, to be fair, that decision was made already and merged. All this patch does is to adjusts traces so they match with current driver behaviour. But yeah, we still have many PRs open to improve the irq handling, so the discussion is valid.
Ack, but only in the rare case that HDA irq is hit between IPC hirq and threaded handlers. And the patch to implement this was from @ranj063 btw. I had a different proposal to use irq masking (PR1011), but that was dropped (and would not change the behaviour we now discuss).
Yes, @lyakh checked this from kernel irq code.
We've been testing the current solution pretty extensively now and to our best knowledge it is solid. The kernel irq framework guarantees that multiple instances of the threaded handler are not run at the same time. This can be observed when triggering the scenario of this bug... the threaded handler is only rerun when the previous run completed (and thus we get the "nothing to do" message). And if no DSP status bits are set, the thread does not nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Thanks for detail explained, @kv2019i , it looks good to me now.
card->owner is a required property and since commit 81033c6 ("ALSA: core: Warn on empty module") a warning is issued if it is empty. Fix lack of it. This fixes following warning observed on RaspberryPi 3B board with ARM 32bit kernel and multi_v7_defconfig: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd] Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b #1087 Hardware name: BCM2835 [<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14) [<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8) [<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4) [<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8) [<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd]) [<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core]) [<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core]) [<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4]) [<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c) [<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4]) [<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0) [<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104) [<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4]) [<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4) [<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350) [<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4) [<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60) [<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc) [<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4) [<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8) [<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110) [<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220) [<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210) [<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338) [<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc) [<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54) Exception stack(0xeded9fa8 to 0xeded9ff0) ... ---[ end trace 6414689569c2bc08 ]--- Fixes: bb7d785 ("drm/vc4: Add HDMI audio support") Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com
[ Upstream commit ec653df ] card->owner is a required property and since commit 81033c6 ("ALSA: core: Warn on empty module") a warning is issued if it is empty. Fix lack of it. This fixes following warning observed on RaspberryPi 3B board with ARM 32bit kernel and multi_v7_defconfig: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd] Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b thesofproject#1087 Hardware name: BCM2835 [<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14) [<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8) [<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4) [<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8) [<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd]) [<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core]) [<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core]) [<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4]) [<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c) [<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4]) [<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0) [<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104) [<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4]) [<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4) [<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350) [<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4) [<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60) [<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc) [<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4) [<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8) [<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110) [<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220) [<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210) [<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338) [<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc) [<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54) Exception stack(0xeded9fa8 to 0xeded9ff0) ... ---[ end trace 6414689569c2bc08 ]--- Fixes: bb7d785 ("drm/vc4: Add HDMI audio support") Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes #1047
Downgrade "nothing to do in IRQ thread" message from error to a debug
message in the IPC interrupt handler thread.
The spurious wake-up can happen if a HDA stream interrupt is
raised while the IPC interrupt thread is running. IPC functionality
is not impacted by this condition, so debug is a more appropriate
trace level.