-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571
Conversation
@thesofproject/mediatek pls review. |
I understand this change is soley for IPC3 and legacy platforms. To merge changes from 007 branch, where all IPC3 have been removed completely from google_rtc_audio_processing.c, we need to rename the current file to _ipc3 OR (maybe better) merge 007 changes to a new file google_rtc_audio_processing_dp.c |
@marcinszkudlinski I'd like to keep this in one implementation, actually. Other platforms are otherwise trapped in some very old topic branches that will be a pain to maintain. (Also some of the state fixes here are turning out be be needed even with IPC4 I think). Working on merging the two forks right now, should have a new version up over the weekend. I'll submit to both branches, but this will be the one with clean commit messages. |
Many updates and fork merges from mtl-007-drop-stable, including source/sink rework, s32_le support, DP scheduler support, more cleanup, and a handful of bugs fixed that were present in the earlier versions. Please re-review. The code as it stands here works for me to get working audio through the full AEC stack on mt8195 and (with one workaround in the last patch) Intel MTL. I'll push a separate patch to the mtl-007-drop-stable branch containing these changes in squashed form. |
Squashed fixups to this code from thesofproject#8571 Signed-off-by: Andy Ross <andyross@google.com>
Another spin, now that almost all the split-out PRs have been merged. This is down to one patch now, albeit a rather large one. This will work out of the box to get an AEC build working on MTL. If you want to run it with mt8195 you also need #8691. I'm pretty sure existing review comments have been addressed, the main ones outstanding are:
Anyway, please re-review. It would be good to get this in soon, some more work is starting to back up behind it. (And yes, there's some drama about merge and release strategy vis a vis #8722. But regardless of which merges first, we need to get to a go/no-go decision on this PR). |
@lrudyX , @wszypelt 2 different errors in https://sof-ci.01.org/sof-pr-viewer/#/build/PR8571/build13389086
None seem relevant here. |
Split out one patch that can be fairly clearly explained in isolation. |
And indeed, the checkpatch issue is a false positive, it doesn't like |
@marc-hb TGL was repaired -> new results should be available within 40 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general problem; missing
CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING=y
CONFIG_COMP_STUBS=y
in app/boards/intel_adsp_ace15_mtpm.conf
that means - IPC4 code is never compiled in CI checks
I agree with you @andyross . Furthermore I do not see any short or medium term situation where one would want to run 2 AEC. If someone can come up with a valid use-case then maybe we should revisit the static allocation otherwise I do think it is safer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delays. My lab home has been without power for four days now, so I haven't had access to devices to test with. The power company's web site says they're down to the last 12k/160k of failures though, so hopefully today I'll be able to get back to defrost things. Will have a respin up as soon as possible.
Regarding the static allocation, I'll admit this has been a lot more controversial than I expected. Here's the full argument:
-
AEC wants an extremely large buffer in a contiguous block, the current one is 200k and Lionel wants more.
-
Experimentally, that corresponds to about half of the available heap memory on the first stream startup (I could get a ~430k block allocated, IIRC).
-
Dynamic heaps just aren't going to be reliable when faced with those numbers. I can basically guarantee that after enough overlapping stream operations (especially ones that cross use cases, like opening an HDMI output before closing your headset capture, etc...), you can get the heap into a state where capture won't start because AEC gets a heap failure.
-
Such an error is fatal, but preventable by just making sure the memory is there to begin with.
Longer term, yes, we will surely want to support more than one AEC instance. But longer term we have better options, like for example integrating the internal AEC allocators with the SOF/Zephyr heap, so it doesn't require the big block. Or alternatively in the MMU future where such a feature will probably be deployed, we'll be able to map a module's .bss virtually, which would be an objectively better choice than fine-grained heap allocation of large blocks.
The pipeline code is complicated, and some trigger events (for example reset of a playback pipeline that was started before capture) cross pipelines from the playback/reference stream into the AEC component unexpectedly. Ignore/halt triggers we get from the reference pipeline. Also work around a bug with some stale code in IPC3 builds that detects the lack of an active reference source as an "error" incorrectly. Signed-off-by: Andy Ross <andyross@google.com>
Lots of work on AEC, with an eye to getting it working on main, where it had bitrotten, and pulling in various features (IPC3 pipeline state management and DP scheduling) that had merged in other branches. Dynamically configure stream formats (sample format/rate and channel count) from the connected streams at prepare() time instead of relying on build-time tuning. Port the code to use the source/sink API, in as sophisticated a manner as I can find. Copies unroll cleanly into just a few instructions per sample, including integer/float conversions and de-/interleaving. Support both 16 and 32 bit sample formats, with a fairly clever inlining scheme to share as much code as possible between them. The component will select "copy" function pointers at prepare() time. This works by using the _float32 variant of the AEC API (which is actually the core internal implementation) instead of the _int16 one which involves a conversion. The large buffers required by this component (input and output staging and an internally-managed pool/heap block) are now static symbols instead of dynamic memory from the heap. These are very large, taking up about half of what is available to the linker on MTL. Relying on heap allocation is just dangerous in this context. This fully decouples AEC from the playback stream. It will run without an active reference happily, feeding zeros to the processing, and pick up in stride when the pipeline starts. Note that this feature is unexercisable on IPC4, where the kernel automatically starts up connected pipelines. Fixes a few bugs and misfeatures also: + Chunk the copies by full buffer strides between AEC processing calls instead of testing at each copied frame. + Copy the reference and mic streams in tandem, preventing them from becoming out of sync if the devices weren't themselves synchronized. + Copy the AEC results to the output stream after the call to ProcessCapture() instead of before. This was a hidden latency bug in the original code, I think. + Cleans up the Kconfig to remove stale variables and guard all the component-specific tunables under the top-level component variable. Also uses a default instead of a select to couple to CONFIG_STUBS, allowing AEC to be manually tuned. Signed-off-by: Andy Ross <andyross@google.com>
Cleaner API. Also avoids the need to specify cache line padding at the end. Signed-off-by: Andy Ross <andyross@google.com>
This feature is already enabled in topology, enable it in the build to match. Signed-off-by: Andy Ross <andyross@google.com>
Rebase/respin. Should have addressed the existing review comments (other than the argument about static buffers, obviously) |
I've thought about that and I can not see a use case for running more than one AEC at the same time so I do not think this is something we should plan to support. Hence I am arguing that having the AEC memory pre-allocated is good solution as it'll avoid memory fragmentation, it guarantees that one can use the AEC. |
We should talk about this. I think there are cases we need to consider that might be important. I agree that in the general set of use cases this is true though. |
This is hopelessly conflicted vs. current main. Closing this PR and opening a new one with re-split changes that can merge over main. That should have addressed existing review comments, but please check. |
This series updates the Google RTC Audio Processing module, which has
always been deployed out of platform branches, to run on SOF main.
The tactic taken is a little different: previous integrations have
relied on getting pipeline dependencies to work, where this does a
little trickery at the component level (both in
google_rtc_audio_processing and demux) to control propagation of
command triggers across pipeline boundaries. This should evolve
better with IPC4, which in (almost?) all cases refuses to propagate
those commands anyway.
The main advantage is that the pipelines become decoupled: you can
start an AEC'd capture stream without playback running and launch the
playback DMA later which then integrates with AEC dynamically; you can
shut either down in either order, start the playback before
capture, etc...
A secondary advantage is that, except for one logic fix with reset
propagation that I'm all but certain was a bug, no behavior changes
are needed to the pipeline state logic.
It also ends up a little cleaner and a little faster.
This work was done against a mt8195 device. I'm going to look at MTL
immediately, hopefully nothing will be needed except the format
conversion.