Move audio mixing out of AudioStreamPlayer nodes #2299

ellenhp · 2021-02-16T20:57:40Z

This is one big proposal because the proposed features are tightly coupled. Also if it's accepted I'm up for implementing it, so I don't think it'll take resources away from other efforts except for review and discussion.

Describe the project you are working on

Over the past year or so: XR games, audio games, and generally just fixing audio bugs in Godot

Describe the problem or limitation you are having in your project

Audio players (2d and 3d) pop if they're still playing in queue_free(). static is emitted when freeing an AudioStreamPlayer or resetting its stream property godot#25087 This can technically be worked around in a script by adding a dummy node to the tree and reparenting all audio players you wish to remove from the tree to that node (preserving global transform), stopping them, then adding them to a queue to be freed. While it can be mitigated, this should be the default behavior. I don't think anyone who queue_frees an audio player wants a pop. Related: Audible click/pop when audio bus changes godot#21312
Resampling behavior in Godot varies from mediocre to poor. For WAVs, we use linear interpolation, and other streams use cubic interpolation. Bitrate-like artifact when playing low frequency sounds godot#23544 (and duplicates)
This is mitigated for oggs and mp3s at the cost of performance by Implement a new resampling algorithm in AudioStreamPlaybackResampled godot#46086. A change like this would probably not be appropriate for wavs though, since there could be a huge number of wavs playing at once (resampling is done on a per-player basis and must therefore be extremely performant).
There have been a lot of bugs that have been attributed to data races in the audio system, including two that are currently open. Some people have reported NaNs in the audio buffer in one of these, too. Audio clipping / static / interference on rapid intervals of sound (fixed in master) godot#22016 Audio stops working after scene reload. godot#28110

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Stereo panning code (including SPCAP) will be moved into AudioServer or a helper class. Instead of mixing audio directly into the AudioServer's buffers, AudioStreamPlayer* nodes will mix into an intermediate buffer provided by the AudioServer. The AudioServer will then provide that buffer to its chosen panning method (SPCAP for now) to mix it into the audio bus's buffer. Decoupling the mixing logic from the node playing back the audio like this allows for fadeouts to be easily accomplished even when the playing node is queue_freed.
Resampling will be batched. Currently, a project playing 5 wav files with a 48khz sample rate running on a system with a 44.1khz audio driver will perform a linear resampling step 5 times, one for each wav audio stream. That scales with the number of sources playing, which is probably why doing it's using linear interpolation as opposed to something with better quality. If AudioStreamPlayer nodes mix into an intermediate buffer provided by the AudioServer, there's no need to perform resampling 5 times. The AudioServer could batch all 5 players that are operating at 48khz into one intermediate buffer, then resample to 44.1khz once and mix the resampled buffer into the audio bus's buffer. This allows Godot to default to a higher-quality resampling method, and moves all resampling logic out of the AudioStream classes. Currently resampling is duplicated in several places. I glossed over some of the details here, especially involving consideration of per-node pitch_scale stuff, but none of that is unsolvable, it would just force the AudioServer to deal with a pitch-scaled node separately.
Since most of the brains of the audio mixing will be moved out of the stream player nodes into the AudioServer (or some other class directly controlled by the AudioServer) this gives us a unique opportunity to ensure that the data being passed from the nodes about how to play back that node's audio is handled in a lock-free, thread-safe way.
I talked with @fire about this and he suggested creating a resource that contains all the information the mixing system needs about each individual player node (like its transform, audio volume, emission angle, doppler data, etc). I'm calling this a "PlaybackParams" resource in my head. I looked into this a bit and I think that each node could create a new instance of a PlaybackParams resource with up-to-date data every frame and atomically swap in its RID so that the audio thread never has to wait for a copy to finish. Partitioning the data like this completely prevents data races.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

I don't think a mockup or diagram would be helpful for this proposal but if the details given above aren't sufficient to also satisfy this section let me know and I'll build a proof of concept PR. I wrote some sloppy pseudocode showing how batching could occur, though. It has lots of problems like no consideration of multiple audio buses, no keeping track of the resampler's state between mixes, no fadeout/fadein, and no surround sound. But it shows the basic algorithm of grouping by sample rate, then grouping again by the way the sound should be decoded from 2d/3d space to stereo.

// AudioServer.h
map<AudioStreamPlayback, RID> playing_streams_and_playback_params;
PanningMethod panning_method;
AudioFrame intermediate_buffer_main[buffersize];
AudioFrame intermediate_buffer_secondary[buffersize];

// AudioServer.cpp
void mix() {
    map<uint32_t, vector<AudioStreamPlayback>> streams_by_sample_rate;
    for stream in playing_streams_and_playback_params.keys() {
        streams_by_sample_rate[stream.get_effective_sample_rate()].push_back(stream);
    }
    for sample_rate in streams_by_sample_rate.keys() {
        memset(intermediate_buffer_main, 0, sizeof(intermediate_buffer_main));
        map<PanningParams, vector<AudioStreamPlayback>> streams_of_current_sample_rate_by_playback_params;
        for stream in playing_streams_and_playback_params {
            streams_of_current_sample_rate_by_playback_params[stream.get_playback_params()].push_back(stream);
        }
        for panning_params in streams_of_current_sample_rate_by_playback_params.keys() {
            vector<AudioStreamPlayback> current_streams = streams_of_current_sample_rate_by_playback_params[panning_params];
            for stream in current_streams {
                memset(intermediate_buffer_secondary, 0, sizeof(intermediate_buffer_secondary));
                // Care should be taken to ensure that godot doesn't overrun the intermediate buffer by mixing too many samples, because this would likely cause an exploitable heap corruption.
                stream.mix_without_resampling(intermediate_buffer_secondary, delta_t)
            }
            // Mix this intermediate_buffer_secondary into our main intermediate_buffer.
            panning_method.mix_to_stereo(panning_params, intermediate_buffer_secondary, intermediate_buffer_main);
        }
        resample_and_add_to_bus_buffer(intermediate_buffer_main, audio_bus_buffer)
    }
}

If this enhancement will not be used often, can it be worked around with a few lines of script?

Issue 1 mentioned above could be mitigated with script, but issue 2 can't be solved without taking a serious performance hit.

Is there a reason why this should be core and not an add-on in the asset library?

The audio system is currently very difficult or impossible to extend without changes in core.

The text was updated successfully, but these errors were encountered:

Calinou · 2021-02-16T21:15:49Z

See also #2298.

reduz · 2021-02-17T18:58:36Z

I have been thinking about it, makes sense in many points, here is my feedback:

Stereo and positional audio I would leave as-is, if the only place that uses it is AudioStreamPlayer3D, then it makes no sense to move it out from there.
Resampling I would leave as is, the way it works now wont be too expensive and can discern between 1) Streams that generate audio to the right output freq 2) Streams that resample via ringbuffer (ogg/mp3/etc) 3) Streams that resample on the fly directly interpolation from the source (wav), as each serves its purpose.
Registering streams in AudioServer and asking them to mix in a buffer does make a lot of sense, since the following situations could be solved:
- Properly moving from one (or multiple buses) to another, doing the right fades.
- When streams are un-registered or stopped, AudioServer can ask them (outside of the audio thread) to prepare an extra mix block, which will be used to fade out at the next mix callback,and hence avoid clicks/pops.
The new API to register streams would also need to be told which buses to send, and which amount.

ellenhp · 2021-02-17T21:10:55Z

I can work with this! Makes sense. I'm glad you like the idea of the audio server taking a more active role in mixing and doing fades itself since I think that's a good architectural change. Keeping resampling as-is makes sense as long as there are no performance concerns for switching to cubic or better everywhere. I'll hack away at this and see what I come up with. Exciting times!

ellenhp · 2021-08-27T20:07:42Z

This proposal has been implemented.

Calinou added the topic:audio label Feb 16, 2021

ellenhp mentioned this issue Feb 17, 2021

Fix a pop in AudioStreamPlaybackResampled godotengine/godot#46078

Closed

Calinou changed the title ~~Move audio mixing out of AudioStreamPlayer nodes and overhaul audio resampling.~~ Move audio mixing out of AudioStreamPlayer nodes and overhaul audio resampling Feb 17, 2021

ellenhp changed the title ~~Move audio mixing out of AudioStreamPlayer nodes and overhaul audio resampling~~ Move audio mixing out of AudioStreamPlayer nodes Feb 20, 2021

This was referenced Feb 21, 2021

Make Ref<> assignments atomic #2330

Closed

Make Ref assignment atomic godotengine/godot#46302

Closed

Sleep the coreaudio driver in-editor after 15s of silence godotengine/godot#45948

Closed

jjmontesl mentioned this issue Apr 29, 2021

Audible click/pop when audio bus changes godotengine/godot#21312

Closed

Calinou mentioned this issue Jul 6, 2021

Add a project setting to configure the audio resampling algorithm #2298

Closed

ellenhp mentioned this issue Aug 5, 2021

Move mixing out of the AudioStreamPlayer* nodes godotengine/godot#51296

Merged

ellenhp closed this as completed Aug 27, 2021

Calinou added this to the 4.0 milestone Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move audio mixing out of AudioStreamPlayer nodes #2299

Move audio mixing out of AudioStreamPlayer nodes #2299

ellenhp commented Feb 16, 2021 •

edited

Loading

Calinou commented Feb 16, 2021

reduz commented Feb 17, 2021 •

edited

Loading

ellenhp commented Feb 17, 2021

ellenhp commented Aug 27, 2021

Move audio mixing out of AudioStreamPlayer nodes #2299

Move audio mixing out of AudioStreamPlayer nodes #2299

Comments

ellenhp commented Feb 16, 2021 • edited Loading

Describe the project you are working on

Describe the problem or limitation you are having in your project

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

If this enhancement will not be used often, can it be worked around with a few lines of script?

Is there a reason why this should be core and not an add-on in the asset library?

Calinou commented Feb 16, 2021

reduz commented Feb 17, 2021 • edited Loading

ellenhp commented Feb 17, 2021

ellenhp commented Aug 27, 2021

ellenhp commented Feb 16, 2021 •

edited

Loading

reduz commented Feb 17, 2021 •

edited

Loading