Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce DMA reload frequency on host DMA #7306

Merged
merged 3 commits into from
Mar 24, 2023
Merged

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Mar 20, 2023

No description provided.

lyakh added 2 commits March 20, 2023 17:58
"flags" in host_copy_normal() aren't used, remove them.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This should reduce DRAM access frequency and allow deeper power
saving modes, particularly when using Deep Buffer PCMs.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
buffer_c = buffer_acquire(hd->dma_buffer);

/* Reload DMA when half of the buffer has been used */
if (hd->partial_size >= buffer_c->stream.size >> 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh seems like a good idea to reduce DMA traffic but do you think it makes sense to limit this to playback only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 what effect would this have for capture? If the host buffer is large, it would delay triggering the host until more data is available, right? So, that means some additional latency. Do we use such larger buffers for capture? If we don't - that isn't a problem, if we do - isn't that the reason why we use them - to buffer more data in them before waking up the consumer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use large buffers for capture yet but I think there may be a plan to use 10ms buffers if not larger. But yes in that case, maybe it makes sense to do the same thing there as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW @lyakh do we know that the DMA transfer block size is equal to the size of the DMA buffer size in the case of deep buffer? It should be but just wondering if it really is or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in my tests I saw the deep buffer size equal to 38400 bytes, i.e. 100 * 384, and 384 which is 32 bits * 2 channels * 48 and corresponds to 1ms at 48000 samples per second, i.e. it's a 100ms buffer, which seems correct. And I hope that isn't the size that the DMA transfers in one go? Or have I misunderstood?

hd->partial_size += copy_bytes;
buffer_c = buffer_acquire(hd->dma_buffer);

/* Reload DMA when half of the buffer has been used */
Copy link
Member

@plbossart plbossart Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why half? It's not bad but is this really what we want?

If you have a really large deep-buffer buffer, e.g. 100ms, it might be more optimal to only reload that buffer when it's only 20% full - which is still 20ms worth of data - plenty of time to avoid xruns.

This should be a combination of bandwidth + time it takes to wake up the path to memory + processing load.

Additionally, I thought some DMA versions had the ability to fetch memory opportunistically when the path to memory was on.

In other words, sticking to a double-buffer approach is a safe approach but likely not optimal for the deep-buffer case. my 2 cents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50% because I had no idea and wanted to get some feedback - thank you! :-) Then why 20%? Should it maybe be a minimum of 4 periods and the whole buffer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwasko @mmaka1 can you chime in on how this is supposed to work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh , @yongzhi1 graciously agreed to do a quick experiment with deep buffer playback with this PR and reported a good increase in the lower pc-states residencies.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a great idea to reduce the wakes and it is aligned with the future planned improvements. However I am not sure if this should be a client responsibility to drive any underlying DMA HW using a common policy. There might be some HW specific mechanisms that are already in place/some planned, so how about implementing a policies inside the DMA drivers? They are aware of wake latency, may compare with pace of consuming the remaining data etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaka1 If we do it in the DMA driver, the threshold should be configurable. Configuration parameters are passed in a struct dma_config structure. It doesn't have anything like a "threshold" but it has source and destination burst lengths. Should we use that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaka1 @lyakh What if we have this as a platform setting for now (maybe as a Kconfig)? We could use perhaps DMA attributes, but I wonder if the DMA driver can provide this info. The DMA driver will will likely not know much about how it's integrated to the system (i.e. what's the max latency to access host memory) .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one more commit on top - see whether you like that. We can either drop it and use the simple version, or merge the last two commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh this looks good to me!

@kv2019i kv2019i requested a review from mwasko March 21, 2023 14:07
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are lot of ways this could be parameterized, but I think as a starting point, this covers the basic need, is easy to understand and can be demonstrated to bring immediate savings, so I seems like a good addition.

src/audio/host-zephyr.c Show resolved Hide resolved
@lyakh
Copy link
Collaborator Author

lyakh commented Mar 24, 2023

The CI reported a failed test: https://sof-ci.01.org/sofpr/PR7306/build4998/devicetest/index.html?model=TGLU_RVP_NOCODEC_IPC4ZPH&testcase=check-playback-10sec which is strange. It's on a non-deep buffer stream, so this PR shouldn't affect that test. But I also don't see any open issues similar to that failure.
UPDATE: but I also see a supposedly identical failure in a daily report from Monday the 20th, so apparently it isn't related indeed. And that failure seems to be an instance of #7191

Add Kconfig options to specify when to reload DMA, by default it will
reload 4ms before the buffer is empty (or full).

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh
Copy link
Collaborator Author

lyakh commented Mar 24, 2023

CI: no relevant failures this time

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @lyakh

@plbossart plbossart requested a review from ranj063 March 24, 2023 14:39
@ranj063 ranj063 merged commit e95723a into thesofproject:main Mar 24, 2023
@lyakh lyakh deleted the deepbuf branch March 24, 2023 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants