-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user-selectable render quantum size #2450
Comments
Updated
It's a hint to allow the browser to choose something else appropriate to the browser such as a power of two. Updated
|
What do do about the 3-arg |
Closely related to WebAudio/web-audio-cg#9 |
Another thing to consider with this issue is what happens with |
@rtoy From the uninformed perspective of a front-end developer, I'd be more than happy to just use the dictionary constructor. Also marking this as related to: https://bugs.chromium.org/p/chromium/issues/detail?id=924426, where it seems we've identified that awkward buffer sizes lead to significant performance hits. |
The AudioBus/Channel classes in the implementation don't really like changing from frame to frame; the chromium issue describes buffers that are either 128 or 192 bytes; the glitching is probably changes in buffer sizes being propagated through the graph, possibly a frame late. I think an easier and more reasonable fix is to keep the 128 quantum size, but run the streams through a ring buffer so that the engine is always fed 128 byte chunks to process, and possibly running a chunk ahead of the callbacks. I worked through supporting a variable quantum in LabSound (which was originally a fork of the webkit webaudio sources), and it seems like way more trouble than its worth, especially in the face of alternative simple solutions like a ring buffer. (e.g. https://github.com/dr-soft/miniaudio/blob/master/examples/fixed_size_callback.c) By way more trouble, I mean that most of the convolution filters, such as delays, HRTF, and reverb, all have a dependency on working on power of two chunks, and mismatching those versus the render callback size is bothersome, and without careful thought can introduce new pops and latency. Although I did the work on the variable quantum I am going to abandon that branch... |
I believe all implementations do some kind of FIFO/ring buffer to manage the difference between WebAudio's 128-frame chunks and the underlying HW block sizes. I have made some changes to Chrome to support this, and even in the simple cases if the block size is changed to some other value, many of the current WPT tests fail because the generated numbers are different from the expected. I do not if that's because I messed up or because that's how things work, or because extra (or less?) round-off happens. And, as you say, getting anything that uses FFTs as the underlying implementation is a ton of work, and impacts performance. For small sizes, this probably hurts performance quite a bit. For large sizes, this probably helps because we can use larger FFT sizes. In all, this is a ton of work to get things all working and performing well. |
What @meshula describes is what is being done today in implementations. It's however very inefficient in cases where the system buffer size is not a power of two (very common on anything but macOS, if we're talking about consumer setups), so we're changing it. There is no other way to fix this properly: the fundamental issue is that the load per callback cannot be stable if the rendering quantum size is not a divisor of the system callback size, and this means that theoretical maximum load is reduced. Let's consider a practical example (my phone), that has a buffer size of 192 frames. With a rendering quantum of 128 frames, a native sample-rate of 48kHz, and a system buffer size of 192 frames, the rendering with a ring buffer to adapt the buffer sizes go like this:
Because this is real-time, we have The render quantum size is not going to be variable. It's going to be user-selectable at construction, and will be based on the characteristics of the underlying audio stack. This means we'll have to fix the FFT code indeed, the simple fix being to introduce the ring buffer only there. |
Thanks for laying this out so clearly and succinctly @padenot! The point about various nodes needing to operate on certain buffer sizes is a good one, but as @padenot points out, the common practice (in audio plugin development, for example) is to satisfy these constraints within the implementation of the node itself. If a node/plugin needs to operate on power of two buffers because it does FFTs, it's usually the case that the node/plugin is responsible for doing the necessary buffering internally. Similarly, if a node/plugin needs to do its processing at a certain sample rate, the node/plugin will do the required resampling internally, as opposed to forcing all other nodes in the graph to run at the same sample rate. |
The part that made my brain hurt, in the WebKit implementation, was the optimization whereby a bus's channels' buffers can be passed up the chain in the case where a node wouldn't actually affect the buffer. Of course the right answer is that the buffer can't be shared when it hits a node that has a different internal need, and that life is much easier if the render quantum is fixed at initialization, rather than runtime-variable. |
This issue is not about dynamically changing the buffer size during an The buffer sharing technique that you see in WebKit is also present in Gecko, and it's just an optimization. |
Virtual F2F:
|
Exact shape of the API is TBD. There will have to be provisions to have |
From today's teleconf: We want to allow selecting the desired size, but we also need a way to specify that we want to use whatever the HW-recommended size is. We probably don't want to allow any possible size, but not sure about the constraints except that requiring a power of two isn't going to work for many Android devices. Perhaps small multiples of powers of two? Maybe something like |
I don't think we can have a strict formula here. Windows/WASAPI works in 10ms chunks at the stream's rate, that means that 441 is a very common buffer size. |
Strangely, when WASAPI support was added in Chrome, I remember the 10ms chunks, but the actual size was 440. Didn't quite understand how that worked, but maybe my memory is wrong. |
Teleconf: Do something like latencyHint: an integer or an enum for default, and HW size. Leave it up to the browser to choose the requested size or round to something close. And AudioWorklet can tell you what size was acutally used. @padenot notes that some systems (pulseaudio) don't have constant buffer sizes; they can change over time. |
This ties into the output device change (either explicitly or implicitly, because for example the Having an event that is fired on an |
This is ready for a first cut. The things needed are:
|
TPAC 2020: Basic proposal: The render size is a hint. The default is 128. There would be an enum to request the optimum HW size. Browsers are only required to support powers of two, but highly encouraged to support more. There will be an attribute on the AudioContext to let developers know what the actual size is. This will be an additional member of AudioContextOptions. |
Rough proposal, actual names TBD
May not want this derived dictionary. The minimum and maximum supported render size is up to the browser. Perhaps we can say values from 128 to 4096(?) are required to be supported? Lower and higher values are allowed. The actual value used is up to the browser, except powers of two are required to be supported and honored provided they don't exceed the minimum and maximum allowed sizes for the browser. We don't specify how this value is chosen. It is highly recommended that browsers also support other sizes that are common for the OS. The Some additional implementation notes, not relevant to the spec but important for implementors.
|
For an
This also means we want the BaseAudioContext to have a renderSize
This replaces the proposal in https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-709614649 that added this to the |
Creating a For user-selectable sizes, I propose we change the allowed values to be 0 and This preserves the current constraints on the sizes when the |
From the teleconf, the API should be updated:
The dictionaries for the
|
Some additional notes. "default" means 128. "hardware" means the appropriate size for the hardware. Browsers are free to a different value.
Finally, as mentioned in https://github.com/WebAudio/web-audio-api-v2/issues/13#issuecomment-776805580, The interaction between |
See proposed explainer at https://github.com/rtoy/web-audio-api/blob/13-user-render-size-explainer/explainer/user-selectable-render-size.md. Comments welcome. |
Overall the explainer looks great. A few minor suggestions:
"probably" is over cautious, drop
Reword. This would increase latency a bit compared to a native size of 128,, but since Android is already using a size of 192, there is no actual additional latency in practice.
That seems mysterious. Does it mean it might still pick 128? Or that it picks 256, the next largest power of two? Or what?
Aha ok it might pick 256; say so above.
The problem isn't limited to Android.
Maybe explicitly say that for UAs that do double buffer, the latency will increase. |
To expand on this comment from @padenot
Should this, by any chance, also be extended with a new property in the That said though, of course it should be possible to just rip it from the used AudioContext and pass it manually by hand in the custom processor options object, but I feel like this is now an integral part of the AudioWorkletProcessor, so maybe it should be natively available in its options initializer as well. Thoughts? |
How likely is it that this change will land? Also, if this improvement lands, will the array length in AudioWorklets remain at fixed 128 samples, or will it reflect the quantum size of the context? Reading the PR suggests that it would, though I see the BitCrusher example was not changed: https://github.com/WebAudio/web-audio-api/pull/2469/files#r1117951767 Documentation like https://developer.chrome.com/blog/audio-worklet/#custom-audioparam would then benefit from updating. |
This change will land for sure, implementors simply didn't have enough resources to implement it just now: it's a very large change underneath: lots of code assumes 128 frame buffer for performance reasons, e.g. for SIMD, buffer pooling and such, and a very large amount of code has to be modified and tested. But on the other hand it's so beneficial for performance that we can't not land it. It was just postponed in the roadmap, some other work items were smaller in size and also very high priority (e.g. audio output device selection). And also yes, this will change All this audio device change stuff has either already landed or is resolved or almost resolved (#2532), so developers can decide to recreate their audio graph, or not, depending on what's best for the application. |
Perfect, thanks for the update! |
FWIW, I had a bunch of CLs for Chrome that implemented this. IIRC everything was working, except I had not yet handled anything having to do with FFTs. Complicated stuff, but I think I could make it work since the FFT library supported non-powers of 2 FFTs. I was going to restrict the set of buffer sizes to be lengths that were supported by the FFT. Fortunately, this included buffer sizes like 160, 192, 240, etc. that are common on Android devices. |
Is this thread only about choosing a single render quantum size globally for the whole AudioContext? Or would it also be possible to have a subgraph that operated at some submultiple of the main quantum size, for example so that you could implement feedback FM outside of an audio worklet using a quantum size of just a couple of samples, as is possible in MaxMSP? |
It's for the entire AudioContext. Feedback FM or other DSP algorithms that use very short feedback loops are better implemented using The goal of this is to align the render quantum to what the OS uses to maximize performance and also to potentially lower latency. |
Has there been a discussion about considering adding support for changing this setting after the context is set up and running? It's a lot of added complexity, but I imagine there's a close relation to the Render Capacity API. It could be useful to change this without having to re-initialize a web app's entire audio graph, maybe while the context is suspended. |
2023 TPAC Audio WG Discussion: Re: @haywirez The Working Group believes that changing the render quantum size on the fly is not practical. (As far as I can remember, no platform audio APIs support this.) Also, merging this PR does not prevent us from adding more features on top of this. Allowing this only when suspended makes sense, but it creates some interesting edge cases that one could imagine. |
Describe the feature
Allow an
AudioContext
andOfflineAudioContext
to have a user-selectable render quantum size instead of the current fixed size of 128.This allows better integration with AudioDeviceClient which can render in many different sizes.
Is there a prototype?
No. Can't prototype this.
Describe the feature in more detail
Basically add a new dictionary member to specify a render size for the constructors. The default is 128, of course.
The text was updated successfully, but these errors were encountered: