Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: Intel: remove misleading error trace from IRQ thread #1087

Merged

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Jul 9, 2019

Fixes #1047

Downgrade "nothing to do in IRQ thread" message from error to a debug
message in the IPC interrupt handler thread.

The spurious wake-up can happen if a HDA stream interrupt is
raised while the IPC interrupt thread is running. IPC functionality
is not impacted by this condition, so debug is a more appropriate
trace level.

ranj063
ranj063 previously approved these changes Jul 9, 2019
Copy link
Collaborator

@ranj063 ranj063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looksgood. Thanks!

lyakh
lyakh previously approved these changes Jul 10, 2019
@ranj063
Copy link
Collaborator

ranj063 commented Jul 10, 2019

@kv2019i could you please rebase?

Downgrade "nothing to do in IRQ thread" message from error to a debug
message in the IPC interrupt handler thread.

The spurious wake-up can happen if a HDA stream interrupt is
raised while the IPC interrupt thread is running. IPC functionality
is not impacted by this condition, so debug is a more appropriate
trace level.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i dismissed stale reviews from lyakh and ranj063 via 7ce29e7 July 11, 2019 07:04
@kv2019i kv2019i force-pushed the fix/1047-extra-ipc-wakeups branch from 6df57f5 to 7ce29e7 Compare July 11, 2019 07:04
@kv2019i kv2019i requested review from lyakh and ranj063 July 11, 2019 07:04
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 11, 2019

@kv2019i could you please rebase?

ack @ranj063 , done

Copy link
Collaborator

@ranj063 ranj063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need one more approval

@ranj063
Copy link
Collaborator

ranj063 commented Jul 12, 2019

@lyakh @keyonjie @juimonen could you please review this one?

@keyonjie
Copy link

@ranj063 @kv2019i I am still curious about:

  1. if it is not an IPC IRQ, why the ipc_irq_thread() is called, is it better to fix by not calling ipc_irq_thread() in this case?
  2. if it is an IPC IRQ, actually this "nothing to do in IRQ thread" should never happen?

@ranj063
Copy link
Collaborator

ranj063 commented Jul 12, 2019

@keyonjie as explained in the description this happens because the HDA interrupt is raised while we're in the IPC thread and we have not cleared that interrupt just yet. I'm not sure we can avoid this situation.

@keyonjie
Copy link

@ranj063 if "while we're in the IPC thread", that means the HDA_DSP_ADSPIS_IPC is set, and then HDA_DSP_REG_HIPCIE_DONE or HDA_DSP_REG_HIPCT_BUSY should be 1, how can the HDA stream interrupt impact the IPC registers?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 12, 2019

@keyonjie wrote

@ranj063 if "while we're in the IPC thread", that means the HDA_DSP_ADSPIS_IPC is set, and then
HDA_DSP_REG_HIPCIE_DONE or HDA_DSP_REG_HIPCT_BUSY should be 1, how can the HDA stream
interrupt impact the IPC registers?

This is related to how we chose to fix the IPC stress test timeouts. We had multiple race conditions where a hardware interrupt was raised while CPU was executing the threaded handler, and we missed IPC processing in some corner case. To combat this, the hard handler now wakes the thread whenever the the status bits are set when the hard irq handler is called.

In this case, following happens:

  • when the IPC event happen, irq is raised and both ADSPIS and and HIPC* bits are set
  • IPC thread is scheduled
  • ADSPIC is disabled
  • CPU returns from hirq handler
  • IPC irq thread starts, run1
  • HDA stream hirq happens
  • ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)
  • IPC irq thread resume after hirq processing
  • all IPC messages are processed, HIPC* set to zero
  • IPC irq thread completes, run1
  • another IPC thread run was scheduled in previous step (*)
  • no work as HIPC* bits are all zeroes
  • IPC irq thread exits with no work done (the error trace is emitted)

I think this is a valid case with our current stack. @lyakh is working on some improvements (#1042), but I don't think these change the behaviour.

@keyonjie
Copy link

keyonjie commented Jul 12, 2019

  • IPC irq thread starts, run1
  • HDA stream hirq happens
  • ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

So ipc_irq_handler() will be called, but "if (irq_status & HDA_DSP_ADSPIS_IPC)" will be false and it will return with IRQ_NONE.

  • IPC irq thread resume after hirq processing
  • all IPC messages are processed, HIPC* set to zero
  • IPC irq thread completes, run1

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 12, 2019

  • ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

@keyonjie ADSPIS is a logical OR of all upstream DSP status bits. As long as the IPC user thread has not completed handling all the events, ADSPIS will remain set. So if IPC user thread is interrupted by a HDA stream hda irq, the ADSPIS will be still set.

@keyonjie
Copy link

keyonjie commented Jul 12, 2019

  • ADSPIS is still set (ADSPIC=0 only control upstream propagation of ADSPIC.IPC, the status bit can still be set), so IPC thread is scheduled again (*)

The happening of HDA stream hirq will only lead to INTSTS set(and interrupt generated), but ADSPIS won't be set?

@keyonjie ADSPIS is a logical of all downstream DSP status bits. As long as the IPC user thread has not completed handling all the events, ADSPIS will remain set. So if IPC user thread is interrupted by a HDA stream hda irq, the ADSPIS will be still set.

Oh, really? That's bad, that means our check here "if (irq_status & HDA_DSP_ADSPIS_IPC)" is not the sufficient condition to claim that it is a new IPC interrupt.

Can we do some change like this to fix it?

--	if (irq_status & HDA_DSP_ADSPIS_IPC) {
++	if ((irq_status & HDA_DSP_ADSPIS_IPC) &&
++	   (snd_sof_dsp_read(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC) &
++	    HDA_DSP_ADSPIC_IPC)) {

We don't want the thread be scheduled multiple times for the same IPC IRQ.

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 12, 2019

@keyonjie wrote:

Oh, really? That's bad, that means our check here "if (irq_status & HDA_DSP_ADSPIS_IPC)" is
not the sufficient condition to claim that it is a new IPC interrupt.

This is a very infrequent event, so the performance penalty is small. We did discuss this when ironing out fixes for the IPC timeouts and the false wake-ups of the thread seemed the more robust solution.

Can we do some change like this to fix it?

--	if (irq_status & HDA_DSP_ADSPIS_IPC) {
++	if ((irq_status & HDA_DSP_ADSPIS_IPC) &&
++	   (snd_sof_dsp_read(sdev, HDA_DSP_BAR, HDA_DSP_REG_ADSPIC) &
++	    HDA_DSP_ADSPIC_IPC)) {

... I believe this opens up the risks for races again. If some DSP status bit is set towards end of IPC irq thread, this bit will not be handled by the thread anymore but ADSPIC still remains disabled (hirq happens before user thread reenables ADSPIC). With your proposed change, we would not wake up the thread in this case and we would miss the IPC leading to IPC timeout.

Plus, if we merge PR1042, then this logic fails to work as we stop disabling the ADSPIC.

@keyonjie
Copy link

@kv2019i so it sounds like we have to choose one from two bottle of poisons.

With solution what @lyakh and you provided, sounds like we will schedule more irq thread than it actually should be, but it can help to avoid IPC missing/timeout issues.

Some concern about it remained to me:

  1. Is the IRQ thread execution sequence guaranteed? e.g. the instance 1 is preempted and then instance 2 is scheduled, can we make sure instance 2 will be started after instance 1 is finished?

  2. Do we have any other side effect with this thread scheduling more than it actual needed? We are seeing "nothing to do in IRQ thread" logs from time to time, that's fine, we can ignore them, but are they safe enough with multi-entrance? e.g. is it possible that the same IRQ will be handled twice?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 12, 2019

@kv2019i so it sounds like we have to choose one from two bottle of poisons.

@keyonjie Well, to be fair, that decision was made already and merged. All this patch does is to adjusts traces so they match with current driver behaviour. But yeah, we still have many PRs open to improve the irq handling, so the discussion is valid.

With solution what @lyakh and you provided, sounds like we will schedule more irq thread than
it actually should be, but it can help to avoid IPC missing/timeout issues.

Ack, but only in the rare case that HDA irq is hit between IPC hirq and threaded handlers. And the patch to implement this was from @ranj063 btw. I had a different proposal to use irq masking (PR1011), but that was dropped (and would not change the behaviour we now discuss).

Some concern about it remained to me:

  1. Is the IRQ thread execution sequence guaranteed? e.g. the instance 1 is preempted and then instance 2 is scheduled, can we make sure instance 2 will be started after instance 1 is finished?

Yes, @lyakh checked this from kernel irq code.

  1. Do we have any other side effect with this thread scheduling more than it actual needed?
    We are seeing "nothing to do in IRQ thread" logs from time to time, that's fine, we can
    ignore them, but are they safe enough with multi-entrance? e.g. is it possible that the same
    IRQ will be handled twice?

We've been testing the current solution pretty extensively now and to our best knowledge it is solid. The kernel irq framework guarantees that multiple instances of the threaded handler are not run at the same time. This can be observed when triggering the scenario of this bug... the threaded handler is only rerun when the previous run completed (and thus we get the "nothing to do" message). And if no DSP status bits are set, the thread does not nothing.

Copy link

@keyonjie keyonjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Thanks for detail explained, @kv2019i , it looks good to me now.

@ranj063 ranj063 merged commit 04b9bfe into thesofproject:topic/sof-dev Jul 15, 2019
plbossart pushed a commit that referenced this pull request Sep 28, 2020
card->owner is a required property and since commit 81033c6 ("ALSA:
core: Warn on empty module") a warning is issued if it is empty. Fix lack
of it. This fixes following warning observed on RaspberryPi 3B board
with ARM 32bit kernel and multi_v7_defconfig:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd]
Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic
CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b #1087
Hardware name: BCM2835
[<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14)
[<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8)
[<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4)
[<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8)
[<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd])
[<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core])
[<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core])
[<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4])
[<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c)
[<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4])
[<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0)
[<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104)
[<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4])
[<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4)
[<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350)
[<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4)
[<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60)
[<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc)
[<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4)
[<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8)
[<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110)
[<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220)
[<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210)
[<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338)
[<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc)
[<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xeded9fa8 to 0xeded9ff0)
...
---[ end trace 6414689569c2bc08 ]---

Fixes: bb7d785 ("drm/vc4: Add HDMI audio support")
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com
cujomalainey pushed a commit to cujomalainey/linux that referenced this pull request Oct 26, 2020
[ Upstream commit ec653df ]

card->owner is a required property and since commit 81033c6 ("ALSA:
core: Warn on empty module") a warning is issued if it is empty. Fix lack
of it. This fixes following warning observed on RaspberryPi 3B board
with ARM 32bit kernel and multi_v7_defconfig:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 210 at sound/core/init.c:207 snd_card_new+0x378/0x398 [snd]
Modules linked in: vc4(+) snd_soc_core ac97_bus snd_pcm_dmaengine bluetooth snd_pcm snd_timer crc32_arm_ce raspberrypi_hwmon snd soundcore ecdh_generic ecc bcm2835_thermal phy_generic
CPU: 1 PID: 210 Comm: systemd-udevd Not tainted 5.8.0-rc1-00027-g81033c6b584b thesofproject#1087
Hardware name: BCM2835
[<c03113c0>] (unwind_backtrace) from [<c030bcb4>] (show_stack+0x10/0x14)
[<c030bcb4>] (show_stack) from [<c071cef8>] (dump_stack+0xd4/0xe8)
[<c071cef8>] (dump_stack) from [<c0345bfc>] (__warn+0xdc/0xf4)
[<c0345bfc>] (__warn) from [<c0345cc4>] (warn_slowpath_fmt+0xb0/0xb8)
[<c0345cc4>] (warn_slowpath_fmt) from [<bf02ff74>] (snd_card_new+0x378/0x398 [snd])
[<bf02ff74>] (snd_card_new [snd]) from [<bf11f0b4>] (snd_soc_bind_card+0x280/0x99c [snd_soc_core])
[<bf11f0b4>] (snd_soc_bind_card [snd_soc_core]) from [<bf12f000>] (devm_snd_soc_register_card+0x34/0x6c [snd_soc_core])
[<bf12f000>] (devm_snd_soc_register_card [snd_soc_core]) from [<bf165654>] (vc4_hdmi_bind+0x43c/0x5f4 [vc4])
[<bf165654>] (vc4_hdmi_bind [vc4]) from [<c09d660c>] (component_bind_all+0xec/0x24c)
[<c09d660c>] (component_bind_all) from [<bf15c44c>] (vc4_drm_bind+0xd4/0x174 [vc4])
[<bf15c44c>] (vc4_drm_bind [vc4]) from [<c09d6ac0>] (try_to_bring_up_master+0x160/0x1b0)
[<c09d6ac0>] (try_to_bring_up_master) from [<c09d6f38>] (component_master_add_with_match+0xd0/0x104)
[<c09d6f38>] (component_master_add_with_match) from [<bf15c588>] (vc4_platform_drm_probe+0x9c/0xbc [vc4])
[<bf15c588>] (vc4_platform_drm_probe [vc4]) from [<c09df740>] (platform_drv_probe+0x6c/0xa4)
[<c09df740>] (platform_drv_probe) from [<c09dd6f0>] (really_probe+0x210/0x350)
[<c09dd6f0>] (really_probe) from [<c09dd940>] (driver_probe_device+0x5c/0xb4)
[<c09dd940>] (driver_probe_device) from [<c09ddb38>] (device_driver_attach+0x58/0x60)
[<c09ddb38>] (device_driver_attach) from [<c09ddbc0>] (__driver_attach+0x80/0xbc)
[<c09ddbc0>] (__driver_attach) from [<c09db820>] (bus_for_each_dev+0x68/0xb4)
[<c09db820>] (bus_for_each_dev) from [<c09dc9f8>] (bus_add_driver+0x130/0x1e8)
[<c09dc9f8>] (bus_add_driver) from [<c09de648>] (driver_register+0x78/0x110)
[<c09de648>] (driver_register) from [<c0302038>] (do_one_initcall+0x50/0x220)
[<c0302038>] (do_one_initcall) from [<c03db544>] (do_init_module+0x60/0x210)
[<c03db544>] (do_init_module) from [<c03da4f8>] (load_module+0x1e34/0x2338)
[<c03da4f8>] (load_module) from [<c03dac00>] (sys_finit_module+0xac/0xbc)
[<c03dac00>] (sys_finit_module) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xeded9fa8 to 0xeded9ff0)
...
---[ end trace 6414689569c2bc08 ]---

Fixes: bb7d785 ("drm/vc4: Add HDMI audio support")
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200701073949.28941-1-m.szyprowski@samsung.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][CML] 'error: nothing to do in IRQ thread' is shown in dmesg
4 participants