ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

kv2019i · 2019-07-09T15:39:18Z

Downgrade "nothing to do in IRQ thread" message from error to a debug
message in the IPC interrupt handler thread.

The spurious wake-up can happen if a HDA stream interrupt is
raised while the IPC interrupt thread is running. IPC functionality
is not impacted by this condition, so debug is a more appropriate
trace level.

ranj063

looksgood. Thanks!

ranj063 · 2019-07-10T22:54:26Z

@kv2019i could you please rebase?

Downgrade "nothing to do in IRQ thread" message from error to a debug message in the IPC interrupt handler thread. The spurious wake-up can happen if a HDA stream interrupt is raised while the IPC interrupt thread is running. IPC functionality is not impacted by this condition, so debug is a more appropriate trace level. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>

kv2019i · 2019-07-11T07:04:55Z

@kv2019i could you please rebase?

ack @ranj063 , done

ranj063

Need one more approval

ranj063 · 2019-07-12T04:19:32Z

@lyakh @keyonjie @juimonen could you please review this one?

keyonjie · 2019-07-12T04:30:31Z

@ranj063 @kv2019i I am still curious about:

if it is not an IPC IRQ, why the ipc_irq_thread() is called, is it better to fix by not calling ipc_irq_thread() in this case?
if it is an IPC IRQ, actually this "nothing to do in IRQ thread" should never happen?

ranj063 · 2019-07-12T04:41:54Z

@keyonjie as explained in the description this happens because the HDA interrupt is raised while we're in the IPC thread and we have not cleared that interrupt just yet. I'm not sure we can avoid this situation.

keyonjie · 2019-07-12T05:30:29Z

@ranj063 if "while we're in the IPC thread", that means the HDA_DSP_ADSPIS_IPC is set, and then HDA_DSP_REG_HIPCIE_DONE or HDA_DSP_REG_HIPCT_BUSY should be 1, how can the HDA stream interrupt impact the IPC registers?

kv2019i · 2019-07-12T06:54:09Z

@keyonjie wrote

@ranj063 if "while we're in the IPC thread", that means the HDA_DSP_ADSPIS_IPC is set, and then
HDA_DSP_REG_HIPCIE_DONE or HDA_DSP_REG_HIPCT_BUSY should be 1, how can the HDA stream
interrupt impact the IPC registers?

This is related to how we chose to fix the IPC stress test timeouts. We had multiple race conditions where a hardware interrupt was raised while CPU was executing the threaded handler, and we missed IPC processing in some corner case. To combat this, the hard handler now wakes the thread whenever the the status bits are set when the hard irq handler is called.

In this case, following happens:

when the IPC event happen, irq is raised and both ADSPIS and and HIPC* bits are set
IPC thread is scheduled
ADSPIC is disabled
CPU returns from hirq handler
IPC irq thread starts, run1
HDA stream hirq happens
ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)
IPC irq thread resume after hirq processing
all IPC messages are processed, HIPC* set to zero
IPC irq thread completes, run1
another IPC thread run was scheduled in previous step (*)
no work as HIPC* bits are all zeroes
IPC irq thread exits with no work done (the error trace is emitted)

I think this is a valid case with our current stack. @lyakh is working on some improvements (#1042), but I don't think these change the behaviour.

keyonjie · 2019-07-12T07:37:52Z

IPC irq thread starts, run1

HDA stream hirq happens

ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

So ipc_irq_handler() will be called, but "if (irq_status & HDA_DSP_ADSPIS_IPC)" will be false and it will return with IRQ_NONE.

IPC irq thread resume after hirq processing

all IPC messages are processed, HIPC* set to zero

IPC irq thread completes, run1

kv2019i · 2019-07-12T07:58:47Z

ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

@keyonjie ADSPIS is a logical OR of all upstream DSP status bits. As long as the IPC user thread has not completed handling all the events, ADSPIS will remain set. So if IPC user thread is interrupted by a HDA stream hda irq, the ADSPIS will be still set.

keyonjie · 2019-07-12T08:20:43Z

ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

@keyonjie ADSPIS is a logical of all downstream DSP status bits. As long as the IPC user thread has not completed handling all the events, ADSPIS will remain set. So if IPC user thread is interrupted by a HDA stream hda irq, the ADSPIS will be still set.

Oh, really? That's bad, that means our check here "if (irq_status & HDA_DSP_ADSPIS_IPC)" is not the sufficient condition to claim that it is a new IPC interrupt.

Can we do some change like this to fix it?

--	if (irq_status & HDA_DSP_ADSPIS_IPC) {
++	if ((irq_status & HDA_DSP_ADSPIS_IPC) &&
++	   (snd_sof_dsp_read(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC) &
++	    HDA_DSP_ADSPIC_IPC)) {

We don't want the thread be scheduled multiple times for the same IPC IRQ.

kv2019i · 2019-07-12T08:40:35Z

@keyonjie wrote:

Oh, really? That's bad, that means our check here "if (irq_status & HDA_DSP_ADSPIS_IPC)" is
not the sufficient condition to claim that it is a new IPC interrupt.

This is a very infrequent event, so the performance penalty is small. We did discuss this when ironing out fixes for the IPC timeouts and the false wake-ups of the thread seemed the more robust solution.

Can we do some change like this to fix it?

--	if (irq_status & HDA_DSP_ADSPIS_IPC) {
++	if ((irq_status & HDA_DSP_ADSPIS_IPC) &&
++	   (snd_sof_dsp_read(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC) &
++	    HDA_DSP_ADSPIC_IPC)) {

... I believe this opens up the risks for races again. If some DSP status bit is set towards end of IPC irq thread, this bit will not be handled by the thread anymore but ADSPIC still remains disabled (hirq happens before user thread reenables ADSPIC). With your proposed change, we would not wake up the thread in this case and we would miss the IPC leading to IPC timeout.

Plus, if we merge PR1042, then this logic fails to work as we stop disabling the ADSPIC.

keyonjie · 2019-07-12T09:21:54Z

@kv2019i so it sounds like we have to choose one from two bottle of poisons.

With solution what @lyakh and you provided, sounds like we will schedule more irq thread than it actually should be, but it can help to avoid IPC missing/timeout issues.

Some concern about it remained to me:

Is the IRQ thread execution sequence guaranteed? e.g. the instance 1 is preempted and then instance 2 is scheduled, can we make sure instance 2 will be started after instance 1 is finished?
Do we have any other side effect with this thread scheduling more than it actual needed? We are seeing "nothing to do in IRQ thread" logs from time to time, that's fine, we can ignore them, but are they safe enough with multi-entrance? e.g. is it possible that the same IRQ will be handled twice?

kv2019i · 2019-07-12T11:01:15Z

@kv2019i so it sounds like we have to choose one from two bottle of poisons.

@keyonjie Well, to be fair, that decision was made already and merged. All this patch does is to adjusts traces so they match with current driver behaviour. But yeah, we still have many PRs open to improve the irq handling, so the discussion is valid.

With solution what @lyakh and you provided, sounds like we will schedule more irq thread than
it actually should be, but it can help to avoid IPC missing/timeout issues.

Ack, but only in the rare case that HDA irq is hit between IPC hirq and threaded handlers. And the patch to implement this was from @ranj063 btw. I had a different proposal to use irq masking (PR1011), but that was dropped (and would not change the behaviour we now discuss).

Some concern about it remained to me:

Is the IRQ thread execution sequence guaranteed? e.g. the instance 1 is preempted and then instance 2 is scheduled, can we make sure instance 2 will be started after instance 1 is finished?

Yes, @lyakh checked this from kernel irq code.

Do we have any other side effect with this thread scheduling more than it actual needed?
We are seeing "nothing to do in IRQ thread" logs from time to time, that's fine, we can
ignore them, but are they safe enough with multi-entrance? e.g. is it possible that the same
IRQ will be handled twice?

We've been testing the current solution pretty extensively now and to our best knowledge it is solid. The kernel irq framework guarantees that multiple instances of the threaded handler are not run at the same time. This can be observed when triggering the scenario of this bug... the threaded handler is only rerun when the previous run completed (and thus we get the "nothing to do" message). And if no DSP status bits are set, the thread does not nothing.

keyonjie

OK, Thanks for detail explained, @kv2019i , it looks good to me now.

card->owner is a required property and since commit 81033c6 ("ALSA: core: Warn on empty module") a warning is issued if it is empty. Fix lack of it. This fixes following warning observed on RaspberryPi 3B board with ARM 32bit kernel and multi_v7_defconfig: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd] Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b #1087 Hardware name: BCM2835 [<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14) [<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8) [<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4) [<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8) [<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd]) [<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core]) [<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core]) [<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4]) [<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c) [<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4]) [<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0) [<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104) [<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4]) [<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4) [<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350) [<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4) [<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60) [<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc) [<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4) [<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8) [<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110) [<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220) [<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210) [<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338) [<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc) [<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54) Exception stack(0xeded9fa8 to 0xeded9ff0) ... ---[ end trace 6414689569c2bc08 ]--- Fixes: bb7d785 ("drm/vc4: Add HDMI audio support") Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com

[ Upstream commit ec653df ] card->owner is a required property and since commit 81033c6 ("ALSA: core: Warn on empty module") a warning is issued if it is empty. Fix lack of it. This fixes following warning observed on RaspberryPi 3B board with ARM 32bit kernel and multi_v7_defconfig: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd] Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b thesofproject#1087 Hardware name: BCM2835 [<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14) [<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8) [<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4) [<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8) [<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd]) [<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core]) [<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core]) [<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4]) [<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c) [<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4]) [<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0) [<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104) [<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4]) [<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4) [<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350) [<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4) [<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60) [<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc) [<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4) [<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8) [<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110) [<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220) [<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210) [<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338) [<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc) [<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54) Exception stack(0xeded9fa8 to 0xeded9ff0) ... ---[ end trace 6414689569c2bc08 ]--- Fixes: bb7d785 ("drm/vc4: Add HDMI audio support") Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com Signed-off-by: Sasha Levin <sashal@kernel.org>

kv2019i requested review from juimonen, plbossart, lyakh, ranj063, keyonjie and xiulipan July 9, 2019 15:39

kv2019i requested a review from lgirdwood as a code owner July 9, 2019 15:39

ranj063 previously approved these changes Jul 9, 2019

View reviewed changes

lyakh previously approved these changes Jul 10, 2019

View reviewed changes

kv2019i dismissed stale reviews from lyakh and ranj063 via 7ce29e7 July 11, 2019 07:04

kv2019i force-pushed the fix/1047-extra-ipc-wakeups branch from 6df57f5 to 7ce29e7 Compare July 11, 2019 07:04

kv2019i requested review from lyakh and ranj063 July 11, 2019 07:04

ranj063 approved these changes Jul 11, 2019

View reviewed changes

keyonjie approved these changes Jul 15, 2019

View reviewed changes

ranj063 merged commit 04b9bfe into thesofproject:topic/sof-dev Jul 15, 2019

marc-hb mentioned this pull request Oct 4, 2023

[BUG] delayed mtrace collection #4618

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

kv2019i commented Jul 9, 2019

ranj063 left a comment

ranj063 commented Jul 10, 2019

kv2019i commented Jul 11, 2019

ranj063 left a comment

ranj063 commented Jul 12, 2019

keyonjie commented Jul 12, 2019

ranj063 commented Jul 12, 2019

keyonjie commented Jul 12, 2019

kv2019i commented Jul 12, 2019

keyonjie commented Jul 12, 2019 •

edited

Loading

kv2019i commented Jul 12, 2019 •

edited

Loading

keyonjie commented Jul 12, 2019 •

edited

Loading

kv2019i commented Jul 12, 2019

keyonjie commented Jul 12, 2019

kv2019i commented Jul 12, 2019 •

edited

Loading

keyonjie left a comment

ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

Conversation

kv2019i commented Jul 9, 2019

ranj063 left a comment

Choose a reason for hiding this comment

ranj063 commented Jul 10, 2019

kv2019i commented Jul 11, 2019

ranj063 left a comment

Choose a reason for hiding this comment

ranj063 commented Jul 12, 2019

keyonjie commented Jul 12, 2019

ranj063 commented Jul 12, 2019

keyonjie commented Jul 12, 2019

kv2019i commented Jul 12, 2019

keyonjie commented Jul 12, 2019 • edited Loading

kv2019i commented Jul 12, 2019 • edited Loading

keyonjie commented Jul 12, 2019 • edited Loading

kv2019i commented Jul 12, 2019

keyonjie commented Jul 12, 2019

kv2019i commented Jul 12, 2019 • edited Loading

keyonjie left a comment

Choose a reason for hiding this comment

keyonjie commented Jul 12, 2019 •

edited

Loading

kv2019i commented Jul 12, 2019 •

edited

Loading

keyonjie commented Jul 12, 2019 •

edited

Loading

kv2019i commented Jul 12, 2019 •

edited

Loading