Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Pause/Release Xrun for HDMI pipelines #1277

Merged
merged 2 commits into from
Oct 21, 2019

Conversation

ranj063
Copy link
Collaborator

@ranj063 ranj063 commented Oct 3, 2019

Currently, SND_SOC_DPCM_TRIGGER_PRE/POST determines the order in which FE DAI and BE DAI are triggered. In the case of SND_SOC_DPCM_TRIGGER_PRE, the FE DAI is triggered before
the BE DAI and in the case of SND_SOC_DPCM_TRIGGER_POST, the BE DAI is triggered before the FE DAI.

With the SOF driver, during playback, the FW expects the BE DAI to be triggered before the FE DAI during the START trigger. This can be addressed by setting the trigger order for the FE dai
link to SND_SOC_DPCM_TRIGGER_POST. But during the STOP trigger, retaining the same order of triggering the BE DAI before the FE DAI results in a DSP panic.

The issue can be fixed by mirroring the trigger order of FE and BE DAI's during the START and STOP trigger. So, with the trigger order set to SND_SOC_DPCM_TRIGGER_PRE, the FE DAI will be
trigger first during START and the BE DAI will be triggered first during STOP. Conversely, with the trigger order set to SND_SOC_DPCM_TRIGGER_POST, the BE DAI will be triggered first during START and the FE DAI will be triggered first during the STOP command.

Fixes #1160

@ranj063 ranj063 changed the title Fix Pause/Release Xrun for HDMI pipelines [RFC] Fix Pause/Release Xrun for HDMI pipelines Oct 3, 2019
@ranj063 ranj063 requested a review from kv2019i October 3, 2019 20:03
@ranj063 ranj063 force-pushed the fix/pause_xrun branch 6 times, most recently from f741c2c to 8f1434c Compare October 4, 2019 00:58
@tlauda
Copy link

tlauda commented Oct 4, 2019

@ranj063 Fixed the panic in case some weird sequence will happen in the future: thesofproject/sof#1902.

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 4, 2019

@ranj063 Fixed the panic in case some weird sequence will happen in the future: thesofproject/sof#1902.

@tlauda thanks!

@@ -2282,42 +2282,91 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
}
EXPORT_SYMBOL_GPL(dpcm_be_dai_trigger);

static int dpcm_fe_dai_trigger_fe_first(struct snd_pcm_substream *substream,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 phew, this makes my head burn! :) I mean "fe_trigger_fe_first" is a bit hard to understand. If we have a function to trigger "fe", one would expect this to trigger "fe" and not something else.

Anyways, the commit message does makes sense. I guess the big question are existing ok with change the the release ordering, or should DSP/machien drivers to have ability to specify PRE-versus-POST for each direction separately. I think you need to send to alsa-devel for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i unfortunately when a PCM is triggered it is always done through the FE. So the function to trigger FE takes care of triggering the corresponding BE's too. Now the ordering for PRE and POST work just fine for most cases. So I am a bit nervous about this change affecting other drivers. My first thought was to use the BESPOKE trigger but if you look at soc_pcm_bespoke_trigger(), it doesnt really have to do anything to do with the ordering of the FE/BE DAI's at all. In fact it just invokes the bespoke_triggers implemented by the dai drivers.
And AFAICT, there are no drivers using this.

Anyway I will add some more context to the commit message and send this to alsa-devel

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting patch, thanks!

I think this need to go via alsa-devel . My few cents inline.

@@ -2853,6 +2853,10 @@ static int sof_link_load(struct snd_soc_component *scomp, int index,
if (!link->no_pcm) {
link->nonatomic = true;

/* set trigger order */
link->trigger[0] = SND_SOC_DPCM_TRIGGER_POST;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be set in machin drivers, or at least in Intel DSP specific code (as this is a Intel DSP limitation)? I see most drivers setting this in machine drivers.

Copy link
Collaborator Author

@ranj063 ranj063 Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i SOF does not define the FE DAI links in the machine driver. They are created when the pcm elems in topology are parsed.

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 4, 2019

send to alsa-devl too

@tlauda
Copy link

tlauda commented Oct 10, 2019

@ranj063 Is there a chance to get it merged soon taking into an account, that it fixes one of the CI issues.

@plbossart
Copy link
Member

plbossart commented Oct 10, 2019

@ranj063 Is there a chance to get it merged soon taking into an account, that it fixes one of the CI issues.

I was planning to have this merged with the weekly upstream stuff, but since @ranj063 sent the code as RFC it was not picked up.

@ranj063 do you want to provide an update here and send an update to alsa-devel?

@RanderWang
Copy link

@ranj063 I understood your work. Now SND_SOC_DPCM_TRIGGER_POST is processed dynamically for start and stop case. The usage is different with original design, which is not perfect.

I have an idea to discuss with you, like adding flags
SND_SOC_DPCM_TRIGGER_START_POST,
SND_SOC_DPCM_TRIGGER_START_PRE,
SND_SOC_DPCM_TRIGGER_STOP_POST,
SND_SOC_DPCM_TRIGGER_STOP_PRE.

then trigger = SND_SOC_DPCM_TRIGGER_START_POST | SND_SOC_DPCM_TRIGGER_STOP_PRE
...

A problem is: there are so many status.

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 11, 2019

@ranj063 Is there a chance to get it merged soon taking into an account, that it fixes one of the CI issues.

@tlauda sure. I will update the patch based on suggestions today and send an updated version to alsa-devel.

@plbossart the reason I was delaying sending the update is because I havent done any testing on legacy drivers. I dont have any devices to try that and I'm wondering if @mengdonglin can help with that.

@plbossart
Copy link
Member

@ranj063 Is there a chance to get it merged soon taking into an account, that it fixes one of the CI issues.

@tlauda sure. I will update the patch based on suggestions today and send an updated version to alsa-devel.

@plbossart the reason I was delaying sending the update is because I havent done any testing on legacy drivers. I dont have any devices to try that and I'm wondering if @mengdonglin can help with that.

well, if you are changing the behavior and adding PM support on devices you don't have access to, it's not going to go well... You do need to have access to BYT/CHT devices, it's really a prerequisite.

@wenqingfu
Copy link

SOFCI TEST

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 11, 2019

well, if you are changing the behavior and adding PM support on devices you don't have access to, it's not going to go well... You do need to have access to BYT/CHT devices, it's really a prerequisite.

@plbossart BYT/CHT device PM support is being tested with @cujomalainey 's help. So there are no issues there. I dont have access to ASUS T100 and I doubt we have enough lying around to get access to one for me.

For this particular issue, I am working on getting it tested on KBL/GLK devices for now.

@wenqingfu
Copy link

The latest CI testing (as of "2019-10-11 16:42:37.339213 UTC") seems not bad.

There is a nasty bug on CI result processing, which ends up random number of platforms shown as "not tested" from time to time. In fact, they have been tested. In such case, please re-trigger the CI testing with 'SOFCI TEST'.

@xiulipan @fredoh9

Currently, the trigger orders SND_SOC_DPCM_TRIGGER_PRE/POST
determine the order in which FE DAI and BE DAI are triggered.
In the case of SND_SOC_DPCM_TRIGGER_PRE, the FE DAI is
triggered before the BE DAI and in the case of
SND_SOC_DPCM_TRIGGER_POST, the BE DAI is triggered before
the FE DAI. And this order remains the same irrespective of the
trigger command.

In the case of the SOF driver, during playback, the FW
expects the BE DAI to be triggered before the FE DAI during
the START trigger. The BE DAI trigger handles the starting of
Link DMA and so it must be started before the FE DAI is started
to prevent xruns during pause/release. This can be addressed
by setting the trigger order for the FE dai link to
SND_SOC_DPCM_TRIGGER_POST. But during the STOP trigger,
the FW expects the FE DAI to be triggered before the BE DAI.
Retaining the same order during the START and STOP commands,
results in FW error as the DAI component in the FW is still
active.

The issue can be fixed by mirroring the trigger order of
FE and BE DAI's during the START and STOP trigger. So, with the
trigger order set to SND_SOC_DPCM_TRIGGER_PRE, the FE DAI will be
trigger first during SNDRV_PCM_TRIGGER_START/STOP/RESUME
and the BE DAI will be triggered first during the
STOP/SUSPEND/PAUSE commands. Conversely, with the trigger order
set to SND_SOC_DPCM_TRIGGER_POST, the BE DAI will be triggered
first during the SNDRV_PCM_TRIGGER_START/STOP/RESUME commands
and the FE DAI will be triggered first during the
SNDRV_PCM_TRIGGER_STOP/SUSPEND/PAUSE commands.

Github Issue: thesofproject#1160
Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Set trigger order for FE DAI links to SND_SOC_DPCM_TRIGGER_POST
to trigger the BE DAI's before the FE DAI's. This prevents the
xruns seen on playback pipelines using the link DMA.

Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 11, 2019

@plbossart I have updated this based on your suggestions on alsa-devel.

I have also tested it so far on my KBL based XPS13 with no regressions. More tests are underway and I will update soon.

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 11, 2019

@ranj063 I understood your work. Now SND_SOC_DPCM_TRIGGER_POST is processed dynamically for start and stop case. The usage is different with original design, which is not perfect.

I have an idea to discuss with you, like adding flags
SND_SOC_DPCM_TRIGGER_START_POST,
SND_SOC_DPCM_TRIGGER_START_PRE,
SND_SOC_DPCM_TRIGGER_STOP_POST,
SND_SOC_DPCM_TRIGGER_STOP_PRE.

then trigger = SND_SOC_DPCM_TRIGGER_START_POST | SND_SOC_DPCM_TRIGGER_STOP_PRE
...

A problem is: there are so many status.

@RanderWang this would confuse people more with so many POST/PRE's. I am getting this tested with legacy drivers right now. If there are no regressions I think the proposed change should be good to go.

@ranj063 ranj063 changed the title [RFC] Fix Pause/Release Xrun for HDMI pipelines Fix Pause/Release Xrun for HDMI pipelines Oct 11, 2019
@plbossart
Copy link
Member

plbossart commented Oct 11, 2019 via email

@cujomalainey
Copy link

@plbossart should we look into seeing if we can get some byt/cht devices to you guys? We still very much support them here so the upstream support would definitely be appreciated.

@plbossart
Copy link
Member

plbossart commented Oct 11, 2019 via email

@cujomalainey
Copy link

I'll add this to the list of things I plan on discussing with @lgirdwood when we meet in a couple weeks with @dgreid.

@RanderWang
Copy link

@ranj063 I understood your work. Now SND_SOC_DPCM_TRIGGER_POST is processed dynamically for start and stop case. The usage is different with original design, which is not perfect.
I have an idea to discuss with you, like adding flags
SND_SOC_DPCM_TRIGGER_START_POST,
SND_SOC_DPCM_TRIGGER_START_PRE,
SND_SOC_DPCM_TRIGGER_STOP_POST,
SND_SOC_DPCM_TRIGGER_STOP_PRE.
then trigger = SND_SOC_DPCM_TRIGGER_START_POST | SND_SOC_DPCM_TRIGGER_STOP_PRE
...
A problem is: there are so many status.

@RanderWang this would confuse people more with so many POST/PRE's. I am getting this tested with legacy drivers right now. If there are no regressions I think the proposed change should be good to go.

Yes, I agree with you. I just shared you my idea. Please go ahead with your patch.

@wenqingfu
Copy link

SOFCI TEST

@YvonneYang2
Copy link

PR #1277 can fix #1329 and partly optimize https://github.com/thesofproject/sof/issues/1925. And no side effect found.

Only below steps can reproduce https://github.com/thesofproject/sof/issues/1925 :
Aplay 0,0 +Arecord 0,1 --> background and foreground on arecord 0,1 --> I/O error

@ranj063
Copy link
Collaborator Author

ranj063 commented Oct 21, 2019

@plbossart could we possibly merge this PR to sof-dev to unblock CI while the tests with legacy driver is ongoing?

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's merge and revert/fix if we find any issues.

@plbossart plbossart merged commit fd274c2 into thesofproject:topic/sof-dev Oct 21, 2019
plbossart pushed a commit that referenced this pull request Apr 17, 2020
Make ppc_save_regs() a bit more useful:
  - Set NIP to our caller rather rather than the caller's
    caller (which is what we save to LR in the stack frame).
  - Set SOFTE to the current irq soft-mask state rather than
    uninitialised.
  - Zero CFAR rather than leave it uninitialised.

In qemu, injecting a nmi to an idle CPU gives a nicer stack
trace (note NIP, IRQMASK, CFAR).

  Oops: System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2-00429-ga76e38fd80bf #1277
  NIP:  c0000000000b6e5c LR: c0000000000b6e5c CTR: c000000000b06270
  REGS: c00000000173fb08 TRAP: 0100   Not tainted
  MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000224  XER: 00000000
  CFAR: c0000000016a2128 IRQMASK: c00000000173fc80
  GPR00: c0000000000b6e5c c00000000173fc80 c000000001743400 c00000000173fb08
  GPR04: 0000000000000000 0000000000000000 0000000000000008 0000000000000001
  GPR08: 00000001fea80000 0000000000000000 0000000000000000 ffffffffffffffff
  GPR12: c000000000b06270 c000000001930000 00000000300026c0 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000003 c0000000016a2128
  GPR20: c0000001ffc97148 0000000000000001 c000000000f289a8 0000000000080000
  GPR24: c0000000016e1480 000000011dc870ba 0000000000000000 0000000000000003
  GPR28: c0000000016a2128 c0000001ffc97148 c0000000016a2260 0000000000000003
  NIP [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  LR [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  Call Trace:
  [c00000000173fc80] [c0000000000b6e5c] power9_idle_type+0x5c/0x70 (unreliable)
  [c00000000173fcb0] [c000000000b062b0] stop_loop+0x40/0x60
  [c00000000173fce0] [c000000000b022d8] cpuidle_enter_state+0xa8/0x660
  [c00000000173fd60] [c000000000b0292c] cpuidle_enter+0x4c/0x70
  [c00000000173fda0] [c00000000017624c] call_cpuidle+0x4c/0x90
  [c00000000173fdc0] [c000000000176768] do_idle+0x338/0x460
  [c00000000173fe60] [c000000000176b3c] cpu_startup_entry+0x3c/0x40
  [c00000000173fe90] [c0000000000126b4] rest_init+0x124/0x140
  [c00000000173fed0] [c0000000010948d4] start_kernel+0x938/0x988
  [c00000000173ff90] [c00000000000cdcc] start_here_common+0x1c/0x20

  Oops: System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2-00430-gddce91b8712f #1278
  NIP:  c00000000001d150 LR: c0000000000b6e5c CTR: c000000000b06270
  REGS: c00000000173fb08 TRAP: 0100   Not tainted
  MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000224  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 1
  GPR00: c0000000000b6e5c c00000000173fc80 c000000001743400 c00000000173fb08
  GPR04: 0000000000000000 0000000000000000 0000000000000008 0000000000000001
  GPR08: 00000001fea80000 0000000000000000 0000000000000000 ffffffffffffffff
  GPR12: c000000000b06270 c000000001930000 00000000300026c0 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000003 c0000000016a2128
  GPR20: c0000001ffc97148 0000000000000001 c000000000f289a8 0000000000080000
  GPR24: c0000000016e1480 00000000b68db8ce 0000000000000000 0000000000000003
  GPR28: c0000000016a2128 c0000001ffc97148 c0000000016a2260 0000000000000003
  NIP [c00000000001d150] replay_system_reset+0x30/0xa0
  LR [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  Call Trace:
  [c00000000173fc80] [c0000000000b6e5c] power9_idle_type+0x5c/0x70 (unreliable)
  [c00000000173fcb0] [c000000000b062b0] stop_loop+0x40/0x60
  [c00000000173fce0] [c000000000b022d8] cpuidle_enter_state+0xa8/0x660
  [c00000000173fd60] [c000000000b0292c] cpuidle_enter+0x4c/0x70
  [c00000000173fda0] [c00000000017624c] call_cpuidle+0x4c/0x90
  [c00000000173fdc0] [c000000000176768] do_idle+0x338/0x460
  [c00000000173fe60] [c000000000176b38] cpu_startup_entry+0x38/0x40
  [c00000000173fe90] [c0000000000126b4] rest_init+0x124/0x140
  [c00000000173fed0] [c0000000010948d4] start_kernel+0x938/0x988
  [c00000000173ff90] [c00000000000cdcc] start_here_common+0x1c/0x20

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200403131006.123243-1-npiggin@gmail.com
@ranj063 ranj063 deleted the fix/pause_xrun branch February 13, 2022 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG][playback] Underrun after pause-release on HDA Link pipelines
8 participants