Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: Intel: hda: hda-dai: fix oops on hda_link .free #1541

Merged

Conversation

plbossart
Copy link
Member

When the PCM_PARAM IPC fails, the kernel oopses in the HDaudio link
DMA .free operation. The root cause is a NULL dma_data.

This patches makes sure the dma_data is properly save in hw_params,
even if there are additional errors and tested in hw_free.

Opens:

  1. the entire flow with hw_params used on system resume looks like a hack.

  2. it's not clear why the problem actually happens on hw_free, maybe
    that's because of the -EBUSY call?

Trace with this patch:

14.509636] sof-audio-pci 0000:00:1f.3: pcm: open stream 0 dir 1
[ 14.509643] sof-audio-pci 0000:00:1f.3: period min 192 max 16384 bytes
[ 14.509646] sof-audio-pci 0000:00:1f.3: period count 2 max 16
[ 14.509648] sof-audio-pci 0000:00:1f.3: buffer max 65536 bytes
[ 14.510003] sof-audio-pci 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[ 14.510114] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x80010000: GLB_DAI_MSG: CONFIG
[ 14.510135] sof-audio-pci 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10
[ 14.510144] sof-audio-pci 0000:00:1f.3: pcm: hw params stream 0 dir 1
[ 14.510149] sof-audio-pci 0000:00:1f.3: generating page table for 00000000ce69792e size 0x4b00 pages 5
[ 14.510157] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x40000 successful
[ 14.510175] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x40000 successful
[ 14.510179] sof-audio-pci 0000:00:1f.3: period_bytes:0x12c0
[ 14.510182] sof-audio-pci 0000:00:1f.3: periods:4
[ 14.510197] sof-audio-pci 0000:00:1f.3: stream_tag 2
[ 14.510209] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
[ 14.510513] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60010000 size 20
[ 14.510520] sof-audio-pci 0000:00:1f.3: error: hw params ipc failed for stream 2
[ 14.510525] sof-audio-pci 0000:00:1f.3: ASoC: 0000:00:1f.3 hw params failed: -22
[ 14.510530] HDA Analog: ASoC: hw_params FE failed -22
[ 14.510553] sof-audio-pci 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[ 14.510668] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x80010000: GLB_DAI_MSG: CONFIG
[ 14.510677] sof-audio-pci 0000:00:1f.3: pcm: free stream 0 dir 1
[ 14.510690] sof-audio-pci 0000:00:1f.3: hda_link_hw_free: link_dev is not assigned
[ 14.510821] sof-audio-pc

GitHub issue: #1417
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com

When the PCM_PARAM IPC fails, the kernel oopses in the HDaudio link
DMA .free operation. The root cause is a NULL dma_data.

This patches makes sure the dma_data is properly save in hw_params,
even if there are additional errors and tested in hw_free.

Opens:

1. the entire flow with hw_params used on system resume looks like a hack.

2. it's not clear why the problem actually happens on hw_free, maybe
that's because of the -EBUSY call?

Trace with this patch:

   14.509636] sof-audio-pci 0000:00:1f.3: pcm: open stream 0 dir 1
[   14.509643] sof-audio-pci 0000:00:1f.3: period min 192 max 16384 bytes
[   14.509646] sof-audio-pci 0000:00:1f.3: period count 2 max 16
[   14.509648] sof-audio-pci 0000:00:1f.3: buffer max 65536 bytes
[   14.510003] sof-audio-pci 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[   14.510114] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x80010000: GLB_DAI_MSG: CONFIG
[   14.510135] sof-audio-pci 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10
[   14.510144] sof-audio-pci 0000:00:1f.3: pcm: hw params stream 0 dir 1
[   14.510149] sof-audio-pci 0000:00:1f.3: generating page table for 00000000ce69792e size 0x4b00 pages 5
[   14.510157] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x40000 successful
[   14.510175] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x40000 successful
[   14.510179] sof-audio-pci 0000:00:1f.3: period_bytes:0x12c0
[   14.510182] sof-audio-pci 0000:00:1f.3: periods:4
[   14.510197] sof-audio-pci 0000:00:1f.3: stream_tag 2
[   14.510209] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
[   14.510513] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60010000 size 20
[   14.510520] sof-audio-pci 0000:00:1f.3: error: hw params ipc failed for stream 2
[   14.510525] sof-audio-pci 0000:00:1f.3: ASoC: 0000:00:1f.3 hw params failed: -22
[   14.510530]  HDA Analog: ASoC: hw_params FE failed -22
[   14.510553] sof-audio-pci 0000:00:1f.3: ipc tx: 0x80010000: GLB_DAI_MSG: CONFIG
[   14.510668] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x80010000: GLB_DAI_MSG: CONFIG
[   14.510677] sof-audio-pci 0000:00:1f.3: pcm: free stream 0 dir 1
[   14.510690] sof-audio-pci 0000:00:1f.3: hda_link_hw_free: link_dev is not assigned
[   14.510821] sof-audio-pc

GitHub issue: thesofproject#1417
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
@plbossart
Copy link
Member Author

@ranj063 @RanderWang I could really use some help on how the dma_data is supposed to be initialized and why there are mentions of system resume in hw_params (instead of prepare).

The existing code where we do a _get before a _set is also borderline acceptable, proper initialization flow never hurt.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch looks ok (and is needed). We need to keep the tags over suspend-resume (see commit c7f1b85 ASoC: SOF: hda: fix link DMA config), so we have this extra logic in hw_params.

It's a bit odd link_free is still called, despite hw_params returning -EINVAL, but this seems to be something that can happen as hw_params is used on the resume path as well. If hw_params fails on resume path, ALSA/ASoC core won't know about this and there will be a subsequent link_free call still. So the additional check added in this patch is needed.

Not sure how much separating internal_hw_params from the ALSA core hw_params calls, would help. We'd still have to cover all the same corner cases.

@plbossart
Copy link
Member Author

I think this patch looks ok (and is needed). We need to keep the tags over suspend-resume (see commit c7f1b85 ASoC: SOF: hda: fix link DMA config), so we have this extra logic in hw_params.

It's a bit odd link_free is still called, despite hw_params returning -EINVAL, but this seems to be something that can happen as hw_params is used on the resume path as well. If hw_params fails on resume path, ALSA/ASoC core won't know about this and there will be a subsequent link_free call still. So the additional check added in this patch is needed.

Not sure how much separating internal_hw_params from the ALSA core hw_params calls, would help. We'd still have to cover all the same corner cases.

For SoundWire, I only see .prepare being called on resume. Is this really the case the hw_params are are called in resume for a DAI?

@kv2019i
Copy link
Collaborator

kv2019i commented Nov 26, 2019

@plbossart wrote:

something that can happen as hw_params is used on the resume path as well. If hw_params fails on resume path, ALSA/ASoC core won't know about this and there will be a subsequent link_free call still. So the additional check added in this patch is needed.
Not sure how much separating internal_hw_params from the ALSA core hw_params calls, would help. We'd still have to cover all the same corner cases.

For SoundWire, I only see .prepare being called on resume. Is this really the case the hw_params are are called in resume for a DAI?

Aa, you are right. This actually hits the logic in:

soc-pcm.c:soc_pcm_hw_params()

... you hit the DAI hw_params failure in this function and matching log file is:

[ 14.510525] sof-audio-pci 0000:00:1f.3: ASoC: 0000:00:1f.3 hw params failed: -22
[ 14.510530] HDA Analog: ASoC: hw_params FE failed -22

After that error soc_pcm_hw_params() exits the loop and does a hw_free. I.e. hw_free is called also in the case hw_params fails and our hw_free needs to be resilient to this case (and/or we chnge the soc-pcm.c logic, maybe something to take on alsa-devel). Based on this, your patch seems valid.

@plbossart
Copy link
Member Author

@plbossart wrote:

something that can happen as hw_params is used on the resume path as well. If hw_params fails on resume path, ALSA/ASoC core won't know about this and there will be a subsequent link_free call still. So the additional check added in this patch is needed.
Not sure how much separating internal_hw_params from the ALSA core hw_params calls, would help. We'd still have to cover all the same corner cases.

For SoundWire, I only see .prepare being called on resume. Is this really the case the hw_params are are called in resume for a DAI?

Aa, you are right. This actually hits the logic in:

soc-pcm.c:soc_pcm_hw_params()

... you hit the DAI hw_params failure in this function and matching log file is:

[ 14.510525] sof-audio-pci 0000:00:1f.3: ASoC: 0000:00:1f.3 hw params failed: -22
[ 14.510530] HDA Analog: ASoC: hw_params FE failed -22

After that error soc_pcm_hw_params() exits the loop and does a hw_free. I.e. hw_free is called also in the case hw_params fails and our hw_free needs to be resilient to this case (and/or we chnge the soc-pcm.c logic, maybe something to take on alsa-devel). Based on this, your patch seems valid.

ah nice, thanks for the explanations. So when a FE hw_params fails, this triggers a hw_free in the BE as well, and since we haven't set the dma_data this bombs out.

Good lesson and requirement: all hw_free need to test for data that might be uninitialized.

@@ -217,6 +217,8 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
link_dev = hda_link_stream_assign(bus, substream);
if (!link_dev)
return -EBUSY;

snd_soc_dai_set_dma_data(dai, substream, (void *)link_dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart is this correct? what happens if link_dev isnt null on line 215, then we'd never set the dma_data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that's the whole point: when is this dma_data supposed to be set?

I understood the code as doing the stream assignment once, so if it's null the dma_data had to be set earlier.
Also there are multiple return cases so it's possible with errors that this dma_data is never set, so I moved it just after the stream is allocated.

But to be honest I really don't understand the code. The pattern of testing for something before allocating it assumes a state machine...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart, @RanderWang added the get_dma_data() where it is now based on the bug with multi-pipeline usecase: c7f1b85

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but the question is where should the set_dma_data() be done?
The way it's currently done is error prone, there are a set of return cases where this set_dma_data() would not be set, ever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart I think it should be done just outside the if (!link_dev) {} on line 223 instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we'd set the set_dma_data multiple times, no very clear why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart you are correct. Sorry I dont know what I was reading this morning. Maybe its time to have breakfast!

@plbossart
Copy link
Member Author

@RanderWang can you take a look to make sure I don't break what you added?

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my part this looks good to go, but I think @plbossart you need to modify the git commit. I think we can answer the opens now.

@RanderWang
Copy link

@RanderWang can you take a look to make sure I don't break what you added?

@plbossart your PR is good for me. And I want to discuss your open question.
I add following check to reuse allocated link dma for runtime resume case. link dma is released for runtime suspend and is reallocated for runtime resume. I don't know why. Now we are going to refine our pm, we can discuss it now.

	if (!link_dev) {
		link_dev = hda_link_stream_assign(bus, substream);
		if (!link_dev)
			return -EBUSY;

Copy link

@RanderWang RanderWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@plbossart
Copy link
Member Author

@RanderWang can you take a look to make sure I don't break what you added?

@plbossart your PR is good for me. And I want to discuss your open question.
I add following check to reuse allocated link dma for runtime resume case. link dma is released for runtime suspend and is reallocated for runtime resume. I don't know why. Now we are going to refine our pm, we can discuss it now.

	if (!link_dev) {
		link_dev = hda_link_stream_assign(bus, substream);
		if (!link_dev)
			return -EBUSY;

Sorry, @RanderWang I cannot figure out what your reply meant. if the link dma is released on runtime suspend, how can you reuse it on runtime resume. Are you sure there was no typo in your reply (or a confusion between runtime and system suspend).
Either way, I'd like a clear view of what this code does.

@plbossart plbossart merged commit 0381455 into thesofproject:topic/sof-dev Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants