-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce DMA reload frequency on host DMA #7306
Conversation
"flags" in host_copy_normal() aren't used, remove them. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This should reduce DRAM access frequency and allow deeper power saving modes, particularly when using Deep Buffer PCMs. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
src/audio/host-zephyr.c
Outdated
buffer_c = buffer_acquire(hd->dma_buffer); | ||
|
||
/* Reload DMA when half of the buffer has been used */ | ||
if (hd->partial_size >= buffer_c->stream.size >> 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh seems like a good idea to reduce DMA traffic but do you think it makes sense to limit this to playback only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063 what effect would this have for capture? If the host buffer is large, it would delay triggering the host until more data is available, right? So, that means some additional latency. Do we use such larger buffers for capture? If we don't - that isn't a problem, if we do - isn't that the reason why we use them - to buffer more data in them before waking up the consumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we use large buffers for capture yet but I think there may be a plan to use 10ms buffers if not larger. But yes in that case, maybe it makes sense to do the same thing there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW @lyakh do we know that the DMA transfer block size is equal to the size of the DMA buffer size in the case of deep buffer? It should be but just wondering if it really is or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in my tests I saw the deep buffer size equal to 38400 bytes, i.e. 100 * 384, and 384 which is 32 bits * 2 channels * 48 and corresponds to 1ms at 48000 samples per second, i.e. it's a 100ms buffer, which seems correct. And I hope that isn't the size that the DMA transfers in one go? Or have I misunderstood?
src/audio/host-zephyr.c
Outdated
hd->partial_size += copy_bytes; | ||
buffer_c = buffer_acquire(hd->dma_buffer); | ||
|
||
/* Reload DMA when half of the buffer has been used */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why half? It's not bad but is this really what we want?
If you have a really large deep-buffer buffer, e.g. 100ms, it might be more optimal to only reload that buffer when it's only 20% full - which is still 20ms worth of data - plenty of time to avoid xruns.
This should be a combination of bandwidth + time it takes to wake up the path to memory + processing load.
Additionally, I thought some DMA versions had the ability to fetch memory opportunistically when the path to memory was on.
In other words, sticking to a double-buffer approach is a safe approach but likely not optimal for the deep-buffer case. my 2 cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
50% because I had no idea and wanted to get some feedback - thank you! :-) Then why 20%? Should it maybe be a minimum of 4 periods and the whole buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a great idea to reduce the wakes and it is aligned with the future planned improvements. However I am not sure if this should be a client responsibility to drive any underlying DMA HW using a common policy. There might be some HW specific mechanisms that are already in place/some planned, so how about implementing a policies inside the DMA drivers? They are aware of wake latency, may compare with pace of consuming the remaining data etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmaka1 If we do it in the DMA driver, the threshold should be configurable. Configuration parameters are passed in a struct dma_config
structure. It doesn't have anything like a "threshold" but it has source and destination burst lengths. Should we use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmaka1 @lyakh What if we have this as a platform setting for now (maybe as a Kconfig)? We could use perhaps DMA attributes, but I wonder if the DMA driver can provide this info. The DMA driver will will likely not know much about how it's integrated to the system (i.e. what's the max latency to access host memory) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one more commit on top - see whether you like that. We can either drop it and use the simple version, or merge the last two commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh this looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are lot of ways this could be parameterized, but I think as a starting point, this covers the basic need, is easy to understand and can be demonstrated to bring immediate savings, so I seems like a good addition.
The CI reported a failed test: https://sof-ci.01.org/sofpr/PR7306/build4998/devicetest/index.html?model=TGLU_RVP_NOCODEC_IPC4ZPH&testcase=check-playback-10sec which is strange. It's on a non-deep buffer stream, so this PR shouldn't affect that test. But I also don't see any open issues similar to that failure. |
Add Kconfig options to specify when to reload DMA, by default it will reload 4ms before the buffer is empty (or full). Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
CI: no relevant failures this time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @lyakh
No description provided.