From e77bde9a5f50df276eab9008e280850d3a727bc7 Mon Sep 17 00:00:00 2001
From: James Teh <jamie@jantrid.net>
Date: Mon, 1 May 2023 08:41:34 +1000
Subject: [PATCH] Support for audio output using WASAPI (#14697)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

NVDA's existing audio output code (nvwave) is largely very old and uses WinMM, a very old legacy Windows audio API. It is also written in pure Python, contains quite a few threading locks necessitated by WinMM, and parts of it have become rather difficult to reason about. There are several known stability and audio glitching issues that are difficult to solve with the existing code.

Description of user facing changes
At the very least, this fixes audio glitches at the end of some utterances as described in #10185 and #11061.
I haven't noticed a significant improvement in responsiveness on my system, but my system is also very powerful. It's hard to know whether the stability issues (e.g. #11169) are fixed or not. Time will tell as I run with this more.

Description of development approach
1. The bulk of the WASAPI implementation is written in C++. The WASAPI interfaces are easy to access in C++ and difficult to access in Python. In addition, this allows for the best possible performance, given that we regularly and continually stream audio data.
2. The WinMM code fired callbacks by waiting for the previous chunk to finish playing before sending the next chunk, which could result in buffer underruns (glitches) if callbacks were close together (Python 3 versions of NVDA produce a scratch in the speech when finishing the end of a line #10185 and Texts with multiple line spacings are voiced with NVDA + down arrow and voices crack #11061). In contrast, the WASAPI code uses the audio playback clock to fire callbacks independent of data buffering, eliminating glitches caused by callbacks.
3. The WinMM WavePlayer class is renamed to WinmmWavePlayer. The WASAPI version is called WasapiWavePlayer. Rather than having a common base class, this relies on duck-typing. I figured it didn't make sense to have a base class given that WasapiWavePlayer will likely replace WinmmWavePlayer altogether at some point.
4. WavePlayer is set to one of these two classes during initialisation based on a new advanced configuration setting. WASAPI defaults to disabled.
5. WasapiWavePlayer.feed can take a ctypes pointer and size instead of a Python bytes object. This avoids the overhead of additional memory copying and Python objects in cases where we are given a direct pointer to memory anyway, which is true for most (if not all) speech synthesisers.
6. For compatibility, WinmmWavePlayer.feed supports a ctypes pointer as well, but it just converts it to a Python bytes object.
7. eSpeak and oneCore have been updated to pass a ctypes pointer to WavePlayer.feed.
8. When playWaveFile is used asynchronously, it now feeds audio on the background thread, rather than calling feed on the current thread. This is necessary because the WASAPI code blocks once the buffer (400 ms) is full, rather than having variable sized buffers. Even with the WinMM code, playWaveFile code could block for a short time (nvwave.playWaveFile not fully async #10413). This should improve that also.
9. WasapiWavePlayer supports associating a stream with a specific audio session, which allows that session to be separately configurable in the system Volume Mixer. NVDA tones and wave files have been split into a separate "NVDA sounds" session. WinmmWavePlayer has a new setSessionVolume method that can be used to set the volume of a session. This at least partially addresses Ability to adjust volume of sounds #1409.
---
 nvdaHelper/local/nvdaHelperLocal.def |  11 +
 nvdaHelper/local/sconscript          |   1 +
 nvdaHelper/local/wasapi.cpp          | 629 +++++++++++++++++++++++++++
 source/config/configSpec.py          |   1 +
 source/core.py                       |  24 +-
 source/gui/settingsDialogs.py        |  24 +
 source/nvwave.py                     | 316 +++++++++++++-
 source/synthDrivers/_espeak.py       |  23 +-
 source/synthDrivers/oneCore.py       |  15 +-
 source/tones.py                      |   3 +-
 user_docs/en/changes.t2t             |   1 +
 user_docs/en/userGuide.t2t           |   6 +
 12 files changed, 1023 insertions(+), 31 deletions(-)
 create mode 100644 nvdaHelper/local/wasapi.cpp

diff --git a/nvdaHelper/local/nvdaHelperLocal.def b/nvdaHelper/local/nvdaHelperLocal.def
index ee234a84523..e64def21c21 100644
--- a/nvdaHelper/local/nvdaHelperLocal.def
+++ b/nvdaHelper/local/nvdaHelperLocal.def
@@ -67,3 +67,14 @@ EXPORTS
 	getOleClipboardText
 	_nvdaControllerInternal_reportLiveRegion
 	_nvdaControllerInternal_openConfigDirectory
+	wasPlay_create
+	wasPlay_destroy
+	wasPlay_open
+	wasPlay_feed
+	wasPlay_stop
+	wasPlay_sync
+	wasPlay_pause
+	wasPlay_resume
+	wasPlay_setSessionVolume
+	wasPlay_startup
+	wasPlay_getDevices
diff --git a/nvdaHelper/local/sconscript b/nvdaHelper/local/sconscript
index 3c29b37b6ad..4ff5a22e66b 100644
--- a/nvdaHelper/local/sconscript
+++ b/nvdaHelper/local/sconscript
@@ -91,6 +91,7 @@ localLib=env.SharedLibrary(
 		"UIAUtils.cpp",
 		"mixer.cpp",
 		"oleUtils.cpp",
+		"wasapi.cpp",
 	],
 	LIBS=[
 		"advapi32.lib",
diff --git a/nvdaHelper/local/wasapi.cpp b/nvdaHelper/local/wasapi.cpp
new file mode 100644
index 00000000000..9e5c78f4a8b
--- /dev/null
+++ b/nvdaHelper/local/wasapi.cpp
@@ -0,0 +1,629 @@
+/*
+This file is a part of the NVDA project.
+URL: http://www.nvda-project.org/
+Copyright 2023 James Teh.
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License version 2.0, as published by
+    the Free Software Foundation.
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+This license can be found at:
+http://www.gnu.org/licenses/old-licenses/gpl-2.0.html
+*/
+
+#include <vector>
+#include <windows.h>
+#include <atlcomcli.h>
+#include <audioclient.h>
+#include <audiopolicy.h>
+#include <functiondiscoverykeys.h>
+#include <Functiondiscoverykeys_devpkey.h>
+#include <mmdeviceapi.h>
+#include <common/log.h>
+
+/**
+ * Support for audio playback using WASAPI.
+ * Most of the core work happens in the WasapiPlayer class. Because Python
+ * ctypes can't call C++ classes, NVDA interfaces with this using the wasPlay_*
+ * functions.
+ */
+
+constexpr REFERENCE_TIME REFTIMES_PER_MILLISEC = 10000;
+constexpr REFERENCE_TIME BUFFER_SIZE = 400 * REFTIMES_PER_MILLISEC;
+
+const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
+const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);
+const IID IID_IAudioClient = __uuidof(IAudioClient);
+const IID IID_IAudioRenderClient = __uuidof(IAudioRenderClient);
+const IID IID_IAudioClock = __uuidof(IAudioClock);
+const IID IID_IMMNotificationClient = __uuidof(IMMNotificationClient);
+const IID IID_IAudioSessionControl = __uuidof(IAudioSessionControl);
+const IID IID_ISimpleAudioVolume = __uuidof(ISimpleAudioVolume);
+
+/**
+ * C++ RAII class to manage the lifecycle of a standard Windows HANDLE closed
+ * with CloseHandle.
+ */
+class AutoHandle {
+	public:
+	AutoHandle(): handle(nullptr) {}
+	AutoHandle(HANDLE handle): handle(handle) {}
+
+	~AutoHandle() {
+		if (handle) {
+			CloseHandle(handle);
+		}
+	}
+
+	AutoHandle& operator=(HANDLE newHandle) {
+		if (handle) {
+			CloseHandle(handle);
+		}
+		handle = newHandle;
+		return *this;
+	}
+
+	operator HANDLE() {
+		return handle;
+	}
+
+	private:
+	HANDLE handle;
+};
+
+/**
+ * Listens for default device changes. These are communicated to WasapiPlayer
+ * via the getDefaultDeviceChangeCount method.
+ */
+class NotificationClient: public IMMNotificationClient {
+	public:
+	ULONG STDMETHODCALLTYPE AddRef() override {
+		return InterlockedIncrement(&refCount);
+	}
+
+	ULONG STDMETHODCALLTYPE Release() override {
+		LONG result = InterlockedDecrement(&refCount);
+		if (result == 0) {
+			delete this;
+		}
+		return result;
+	}
+
+	STDMETHODIMP QueryInterface(REFIID riid, void** ppvObject) final {
+		if (riid == IID_IUnknown || riid == IID_IMMNotificationClient) {
+			AddRef();
+			*ppvObject = (void*)this;
+			return S_OK;
+		}
+		return E_NOINTERFACE;
+	}
+
+	STDMETHODIMP OnDefaultDeviceChanged(EDataFlow flow, ERole     role,
+		LPCWSTR   defaultDeviceId
+	) final {
+		if (flow == eRender && role == eConsole) {
+			++defaultDeviceChangeCount;
+		}
+		return S_OK;
+	}
+
+	STDMETHODIMP OnDeviceAdded(LPCWSTR deviceId) final {
+		return S_OK;
+	}
+
+	STDMETHODIMP OnDeviceRemoved(LPCWSTR deviceId) final {
+		return S_OK;
+	}
+
+	STDMETHODIMP OnDeviceStateChanged(LPCWSTR deviceId, DWORD   newState) final {
+		return S_OK;
+	}
+
+	STDMETHODIMP OnPropertyValueChanged(LPCWSTR           deviceId,
+		const PROPERTYKEY key
+	) final {
+		return S_OK;
+	}
+
+	/**
+	 * A counter which increases every time the default device changes. This is
+	 * used by WasapiPlayer instances to detect such changes while playing.
+	 */
+	unsigned int getDefaultDeviceChangeCount() {
+		return defaultDeviceChangeCount;
+	}
+
+	private:
+	LONG refCount = 0;
+	unsigned int defaultDeviceChangeCount = 0;
+};
+
+CComPtr<NotificationClient> notificationClient;
+
+/**
+ * Play a stream of audio using WASAPI.
+ */
+class WasapiPlayer {
+	public:
+	using ChunkCompletedCallback = void(*)(WasapiPlayer* player,
+		unsigned int id);
+
+	/**
+	 * Constructor.
+	 * Specify an empty (not null) deviceId to use the default device.
+	 * Pass GUID_NULL for sessionGuid to use the default audio session.
+	 * Specify an empty (not null) sessionName if you do not wish to set the
+	 * session display name.
+	 */
+	WasapiPlayer(wchar_t* deviceId, WAVEFORMATEX format,
+		ChunkCompletedCallback callback, GUID sessionGuid, wchar_t* sessionName);
+
+	/**
+	 * Open the audio device.
+	 * If force is true, the device will be reopened even if it is already open.
+	 */
+	HRESULT open(bool force=false);
+
+	/**
+	 * Feed a chunk of audio.
+	 * If not null, id will be set to a number used to identify the audio
+	 * associated with this call. The callback will be called with this number when
+	 * this audio finishes playing.
+	 */
+	HRESULT feed(unsigned char* data, unsigned int size, unsigned int* id);
+
+	HRESULT stop();
+	HRESULT sync();
+	HRESULT pause();
+	HRESULT resume();
+	HRESULT setSessionVolume(float level);
+
+	private:
+	void maybeFireCallback();
+
+	// Reset our state due to being stopped. This runs on the feeder thread
+	// rather than on the thread which called stop() because writing to a vector
+	// isn't thread safe.
+	void completeStop();
+
+	// Convert frames into ms.
+	UINT64 framesToMs(UINT32 frames) {
+		return frames * 1000 / format.nSamplesPerSec;
+	}
+
+	// Get the current playback position in ms.
+	UINT64 getPlayPos();
+
+	// Wait until we need to wake up next. This includes needing to fire a
+	// callback.
+	void waitUntilNeeded(UINT64 maxWait=INFINITE);
+
+	enum class PlayState {
+		stopped,
+		playing,
+		stopping,
+	};
+
+	CComPtr<IAudioClient> client;
+	CComPtr<IAudioRenderClient> render;
+	CComPtr<IAudioClock> clock;
+	// The maximum number of frames that will fit in the buffer.
+	UINT32 bufferFrames;
+	std::wstring deviceId;
+	GUID sessionGuid;
+	std::wstring sessionName;
+	WAVEFORMATEX format;
+	ChunkCompletedCallback callback;
+	PlayState playState = PlayState::stopped;
+	// Maps feed ids to the end of their audio in ms since the start of the
+	// stream. This is used to call the callback.
+	std::vector<std::pair<unsigned int, UINT64>> feedEnds;
+	UINT64 clockFreq;
+	// The duration of audio sent (buffered) so far in ms.
+	UINT64 sentMs = 0;
+	unsigned int nextFeedId = 0;
+	AutoHandle wakeEvent;
+	unsigned int defaultDeviceChangeCount;
+};
+
+WasapiPlayer::WasapiPlayer(wchar_t* deviceId, WAVEFORMATEX format,
+	ChunkCompletedCallback callback, GUID sessionGuid, wchar_t* sessionName)
+: deviceId(deviceId), format(format), callback(callback),
+sessionGuid(sessionGuid), sessionName(sessionName)  {
+	wakeEvent = CreateEvent(nullptr, false, false, nullptr);
+}
+
+HRESULT WasapiPlayer::open(bool force) {
+	if (client && !force) {
+		// Device already open and we're not forcing reopen.
+		return S_OK;
+	}
+	defaultDeviceChangeCount = notificationClient->getDefaultDeviceChangeCount();
+	CComPtr<IMMDeviceEnumerator> enumerator;
+	HRESULT hr = enumerator.CoCreateInstance(CLSID_MMDeviceEnumerator);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	CComPtr<IMMDevice> device;
+	if (deviceId.empty()) {
+		hr = enumerator->GetDefaultAudioEndpoint(eRender, eConsole, &device);
+	} else {
+		hr = enumerator->GetDevice(deviceId.c_str(), &device);
+	}
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = device->Activate(IID_IAudioClient, CLSCTX_ALL, nullptr, (void**)&client);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = client->Initialize(AUDCLNT_SHAREMODE_SHARED,
+		AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM | AUDCLNT_STREAMFLAGS_SRC_DEFAULT_QUALITY,
+		BUFFER_SIZE, 0, &format, &sessionGuid);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	if (!sessionName.empty()) {
+		CComPtr<IAudioSessionControl> control;
+		hr = client->GetService(IID_IAudioSessionControl, (void**)&control);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		hr = control->SetDisplayName(sessionName.c_str(), nullptr);
+		if (FAILED(hr)) {
+			return hr;
+		}
+	}
+	hr = client->GetBufferSize(&bufferFrames);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = client->GetService(IID_IAudioRenderClient, (void**)&render);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = client->GetService(IID_IAudioClock, (void**)&clock);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = clock->GetFrequency(&clockFreq);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	playState = PlayState::stopped;
+	return S_OK;
+}
+
+HRESULT WasapiPlayer::feed(unsigned char* data, unsigned int size,
+	unsigned int* id
+) {
+	if (playState == PlayState::stopping) {
+		// stop() was called after feed() returned.
+		completeStop();
+	}
+	UINT32 remainingFrames = size / format.nBlockAlign;
+	HRESULT hr;
+
+	// Returns false if we should abort, in which case we should return hr.
+	auto reopenUsingNewDev = [&] {
+		HRESULT hr = open(true);
+		if (FAILED(hr)) {
+			return false;
+		}
+		// Call any pending callbacks. Otherwise, they'll never get called.
+		for (auto& [itemId, itemEnd]: feedEnds) {
+			callback(this, itemId);
+		}
+		feedEnds.clear();
+		// This is the start of a new stream as far as WASAPI is concerned.
+		sentMs = 0;
+		return true;
+	};
+
+	while (remainingFrames > 0) {
+		UINT32 paddingFrames;
+
+		// Returns false if we should abort, in which case we should return hr.
+		auto getPaddingHandlingStopOrDevChange = [&] {
+			if (playState == PlayState::stopping) {
+				// stop() was called in another thread. Don't send any more.
+				completeStop();
+				hr = S_OK;
+				return false;
+			}
+			if (deviceId.empty() && defaultDeviceChangeCount !=
+					notificationClient->getDefaultDeviceChangeCount()) {
+				// The default device changed.
+				if (!reopenUsingNewDev()) {
+					return false;
+				}
+			}
+			hr = client->GetCurrentPadding(&paddingFrames);
+			if (hr == AUDCLNT_E_DEVICE_INVALIDATED) {
+				// If we're using a specific device, it's just been invalidated. Fall back
+				// to the default device.
+				deviceId.clear();
+				if (!reopenUsingNewDev()) {
+					return false;
+				}
+				hr = client->GetCurrentPadding(&paddingFrames);
+			}
+			return SUCCEEDED(hr);
+		};
+
+		if (!getPaddingHandlingStopOrDevChange()) {
+			return hr;
+		}
+		if (paddingFrames > bufferFrames / 2) {
+			// Wait until the buffer is less than half full.
+			waitUntilNeeded(framesToMs(paddingFrames - bufferFrames / 2));
+			if (!getPaddingHandlingStopOrDevChange()) {
+				return hr;
+			}
+		}
+		// We might have more frames than will fit in the buffer. Send what we can.
+		const UINT32 sendFrames = std::min(remainingFrames,
+			bufferFrames - paddingFrames);
+		const UINT32 sendBytes = sendFrames * format.nBlockAlign;
+		BYTE* buffer;
+		hr = render->GetBuffer(sendFrames, &buffer);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		memcpy(buffer, data, sendBytes);
+		hr = render->ReleaseBuffer(sendFrames, 0);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		if (playState == PlayState::stopped) {
+			hr = client->Start();
+			if (FAILED(hr)) {
+				return hr;
+			}
+			if (playState == PlayState::stopping) {
+				// stop() was called while we were calling client->Start().
+				completeStop();
+				return S_OK;
+			}
+			playState = PlayState::playing;
+		}
+		maybeFireCallback();
+		data += sendBytes;
+		size -= sendBytes;
+		remainingFrames -= sendFrames;
+		sentMs += framesToMs(sendFrames);
+	}
+
+	if (playState == PlayState::playing) {
+		maybeFireCallback();
+	}
+	if (id) {
+		*id = nextFeedId++;
+		// Track that we want to call the callback with this id when playback
+		// reaches the end of the audio provided to this call.
+		// It is important that we add a new callback after we fire existing
+		// callbacks. Otherwise, we might fire a newly added callback before its
+		// feed() call returns, which will fail because the caller doesn't know about
+		// this new id yet.
+		feedEnds.push_back({*id, sentMs});
+	}
+	return S_OK;
+}
+
+void WasapiPlayer::maybeFireCallback() {
+	const UINT64 playPos = getPlayPos();
+	std::erase_if(feedEnds, [&](auto& val) {
+		auto [id, end] = val;
+		if (playPos >= end) {
+			callback(this, id);
+			return true;
+		}
+		return false;
+	});
+}
+
+UINT64 WasapiPlayer::getPlayPos() {
+	// Apparently IAudioClock::GetPosition can be expensive. If we hit performance
+	// problems here, consider using the performance counter it returns for
+	// subsequent calls.
+	UINT64 pos;
+	HRESULT hr = clock->GetPosition(&pos, nullptr);
+	if (FAILED(hr)) {
+		return 0;
+	}
+	return pos * 1000 / clockFreq;
+}
+
+void WasapiPlayer::waitUntilNeeded(UINT64 maxWait) {
+	if (!feedEnds.empty()) {
+		// There's at least one pending callback.
+		UINT64 feedEnd = feedEnds[0].second;
+		const UINT64 nextCallbackTime = feedEnd - getPlayPos();
+		if (nextCallbackTime < maxWait) {
+			// The callback needs to happen before maxWait supplied by the caller.
+			// Lower maxWait accordingly.
+			maxWait = nextCallbackTime;
+		}
+	}
+	WaitForSingleObject(wakeEvent, (DWORD)maxWait);
+}
+
+HRESULT WasapiPlayer::stop() {
+	playState = PlayState::stopping;
+	HRESULT hr = client->Stop();
+	if (FAILED(hr)) {
+		return hr;
+	}
+	hr = client->Reset();
+	if (FAILED(hr)) {
+		return hr;
+	}
+	// If there is a feed/sync call waiting, wake it up so it can immediately
+	// return to the caller.
+	SetEvent(wakeEvent);
+	return S_OK;
+}
+
+void WasapiPlayer::completeStop() {
+	nextFeedId = 0;
+	sentMs = 0;
+	feedEnds.clear();
+	playState = PlayState::stopped;
+}
+
+HRESULT WasapiPlayer::sync() {
+	for (UINT64 playPos = getPlayPos(); playPos < sentMs;
+			playPos = getPlayPos()) {
+		if (playState != PlayState::playing) {
+			return S_OK;
+		}
+		maybeFireCallback();
+		waitUntilNeeded(sentMs - playPos);
+	}
+	// If there's a callback right at the end of the stream (sentMs), fire it.
+	if (playState == PlayState::playing) {
+		maybeFireCallback();
+	}
+	return S_OK;
+}
+
+HRESULT WasapiPlayer::pause() {
+	if (playState != PlayState::playing) {
+		return S_OK;
+	}
+	HRESULT hr = client->Stop();
+	if (FAILED(hr)) {
+		return hr;
+	}
+	return S_OK;
+}
+
+HRESULT WasapiPlayer::resume() {
+	if (playState != PlayState::playing) {
+		return S_OK;
+	}
+	HRESULT hr = client->Start();
+	if (FAILED(hr)) {
+		return hr;
+	}
+	return S_OK;
+}
+
+HRESULT WasapiPlayer::setSessionVolume(float level) {
+	CComPtr<ISimpleAudioVolume> volume;
+	HRESULT hr = client->GetService(IID_ISimpleAudioVolume, (void**)&volume);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	return volume->SetMasterVolume(level, nullptr);
+}
+
+/*
+ * NVDA calls the functions below. Most of these just wrap calls to
+ * WasapiPlayer, with the exception of wasPlay_startup and wasPlay_getDevices.
+ */
+
+WasapiPlayer* wasPlay_create(wchar_t* deviceId, WAVEFORMATEX format,
+	WasapiPlayer::ChunkCompletedCallback callback, GUID sessionGuid,
+	wchar_t* sessionName
+) {
+	return new WasapiPlayer(deviceId, format, callback, sessionGuid, sessionName);
+}
+
+void wasPlay_destroy(WasapiPlayer* player) {
+	delete player;
+}
+
+HRESULT wasPlay_open(WasapiPlayer* player) {
+	return player->open();
+}
+
+HRESULT wasPlay_feed(WasapiPlayer* player, unsigned char* data,
+	unsigned int size, unsigned int* id
+) {
+	return player->feed(data, size, id);
+}
+
+HRESULT wasPlay_stop(WasapiPlayer* player) {
+	return player->stop();
+}
+
+HRESULT wasPlay_sync(WasapiPlayer* player) {
+	return player->sync();
+}
+
+HRESULT wasPlay_pause(WasapiPlayer* player) {
+	return player->pause();
+}
+
+HRESULT wasPlay_resume(WasapiPlayer* player) {
+	return player->resume();
+}
+
+HRESULT wasPlay_setSessionVolume(WasapiPlayer* player, float level) {
+	return player->setSessionVolume(level);
+}
+
+/**
+ * This must be called once per session at startup before wasPlay_create is
+ * called.
+ */
+HRESULT wasPlay_startup() {
+	CComPtr<IMMDeviceEnumerator> enumerator;
+	HRESULT hr = enumerator.CoCreateInstance(CLSID_MMDeviceEnumerator);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	notificationClient = new NotificationClient();
+	return enumerator->RegisterEndpointNotificationCallback(notificationClient);
+}
+
+/**
+ * Get playback device ids and friendly names.
+ * devicesStr will be set to a BSTR of device ids and names separated by null
+ * characters; e.g. "id1\0name1\0id2\0name2\0"
+ */
+HRESULT wasPlay_getDevices(BSTR* devicesStr) {
+	CComPtr<IMMDeviceEnumerator> enumerator;
+	HRESULT hr = enumerator.CoCreateInstance(CLSID_MMDeviceEnumerator);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	CComPtr<IMMDeviceCollection> devices;
+	hr = enumerator->EnumAudioEndpoints(eRender, DEVICE_STATE_ACTIVE, &devices);
+	if (FAILED(hr)) {
+		return hr;
+	}
+	UINT count = 0;
+	devices->GetCount(&count);
+	std::wostringstream s;
+	for (UINT d = 0; d < count; ++d) {
+		CComPtr<IMMDevice> device;
+		hr = devices->Item(d, &device);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		wchar_t* id;
+		hr = device->GetId(&id);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		s << id << L'\0';
+		CoTaskMemFree(id);
+		CComPtr<IPropertyStore> props;
+		hr = device->OpenPropertyStore(STGM_READ, &props);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		PROPVARIANT val;
+		hr = props->GetValue(PKEY_Device_FriendlyName, &val);
+		if (FAILED(hr)) {
+			return hr;
+		}
+		s << val.pwszVal << L'\0';
+		PropVariantClear(&val);
+	}
+	*devicesStr = SysAllocStringLen(s.str().c_str(), (UINT)s.tellp());
+	return S_OK;
+}
diff --git a/source/config/configSpec.py b/source/config/configSpec.py
index 6d4deb5ffed..c8a7f11f98f 100644
--- a/source/config/configSpec.py
+++ b/source/config/configSpec.py
@@ -56,6 +56,7 @@
 # Audio settings
 [audio]
 	audioDuckingMode = integer(default=0)
+	wasapi = boolean(default=true)
 
 # Braille settings
 [braille]
diff --git a/source/core.py b/source/core.py
index bb2bbe7a755..724241b59b2 100644
--- a/source/core.py
+++ b/source/core.py
@@ -17,7 +17,6 @@
 import sys
 import winVersion
 import threading
-import nvwave
 import os
 import time
 import ctypes
@@ -502,18 +501,24 @@ def main():
 	config.initialize()
 	if config.conf['development']['enableScratchpadDir']:
 		log.info("Developer Scratchpad mode enabled")
-	if not globalVars.appArgs.minimal and config.conf["general"]["playStartAndExitSounds"]:
-		try:
-			nvwave.playWaveFile(os.path.join(globalVars.appDir, "waves", "start.wav"))
-		except:
-			pass
-	logHandler.setLogLevelFromConfig()
 	if languageHandler.isLanguageForced():
 		lang = globalVars.appArgs.language
 	else:
 		lang = config.conf["general"]["language"]
 	log.debug(f"setting language to {lang}")
 	languageHandler.setLanguage(lang)
+	import NVDAHelper
+	log.debug("Initializing NVDAHelper")
+	NVDAHelper.initialize()
+	import nvwave
+	log.debug("initializing nvwave")
+	nvwave.initialize()
+	if not globalVars.appArgs.minimal and config.conf["general"]["playStartAndExitSounds"]:
+		try:
+			nvwave.playWaveFile(os.path.join(globalVars.appDir, "waves", "start.wav"))
+		except Exception:
+			pass
+	logHandler.setLogLevelFromConfig()
 	log.info(f"Windows version: {winVersion.getWinVer()}")
 	log.info("Using Python version %s"%sys.version)
 	log.info("Using comtypes version %s"%comtypes.__version__)
@@ -529,9 +534,6 @@ def main():
 	import appModuleHandler
 	log.debug("Initializing appModule Handler")
 	appModuleHandler.initialize()
-	import NVDAHelper
-	log.debug("Initializing NVDAHelper")
-	NVDAHelper.initialize()
 	log.debug("initializing background i/o")
 	import hwIo
 	hwIo.initialize()
@@ -818,7 +820,6 @@ def _doPostNvdaStartupAction():
 	_terminate(JABHandler, name="Java Access Bridge support")
 	_terminate(appModuleHandler, name="app module handler")
 	_terminate(tones)
-	_terminate(NVDAHelper)
 	_terminate(touchHandler)
 	_terminate(keyboardHandler, name="keyboard handler")
 	_terminate(mouseHandler)
@@ -851,6 +852,7 @@ def _doPostNvdaStartupAction():
 	# #5189: Destroy the message window as late as possible
 	# so new instances of NVDA can find this one even if it freezes during exit.
 	messageWindow.destroy()
+	_terminate(NVDAHelper)
 	log.debug("core done")
 
 def _terminate(module, name=None):
diff --git a/source/gui/settingsDialogs.py b/source/gui/settingsDialogs.py
index 57c22a37a26..af69a226a1c 100644
--- a/source/gui/settingsDialogs.py
+++ b/source/gui/settingsDialogs.py
@@ -3012,6 +3012,27 @@ def __init__(self, parent):
 		self.reportTransparentColorCheckBox.defaultValue = self._getDefaultValue(
 			["documentFormatting", "reportTransparentColor"])
 
+		# Translators: This is the label for a group of advanced options in the
+		# Advanced settings panel
+		label = _("Audio")
+		audio = wx.StaticBoxSizer(wx.VERTICAL, self, label=label)
+		audioBox = audio.GetStaticBox()
+		audioGroup = guiHelper.BoxSizerHelper(self, sizer=audio)
+		sHelper.addItem(audioGroup)
+
+		# Translators: This is the label for a checkbox control in the
+		#  Advanced settings panel.
+		label = _("Use WASAPI for audio output (requires restart)")
+		self.wasapiCheckBox: wx.CheckBox = audioGroup.addItem(
+			wx.CheckBox(audioBox, label=label)
+		)
+		self.bindHelpEvent("WASAPI", self.wasapiCheckBox)
+		self.wasapiCheckBox.SetValue(
+			config.conf["audio"]["wasapi"]
+		)
+		self.wasapiCheckBox.defaultValue = self._getDefaultValue(
+			["audio", "wasapi"])
+
 		# Translators: This is the label for a group of advanced options in the
 		# Advanced settings panel
 		label = _("Debug logging")
@@ -3099,6 +3120,7 @@ def haveConfigDefaultsBeenRestored(self):
 			and self.loadChromeVBufWhenBusyCombo.isValueConfigSpecDefault()
 			and self.caretMoveTimeoutSpinControl.GetValue() == self.caretMoveTimeoutSpinControl.defaultValue
 			and self.reportTransparentColorCheckBox.GetValue() == self.reportTransparentColorCheckBox.defaultValue
+			and self.wasapiCheckBox.GetValue() == self.wasapiCheckBox.defaultValue
 			and set(self.logCategoriesList.CheckedItems) == set(self.logCategoriesList.defaultCheckedItems)
 			and self.playErrorSoundCombo.GetSelection() == self.playErrorSoundCombo.defaultValue
 			and True  # reduce noise in diff when the list is extended.
@@ -3124,6 +3146,7 @@ def restoreToDefaults(self):
 		self.loadChromeVBufWhenBusyCombo.resetToConfigSpecDefault()
 		self.caretMoveTimeoutSpinControl.SetValue(self.caretMoveTimeoutSpinControl.defaultValue)
 		self.reportTransparentColorCheckBox.SetValue(self.reportTransparentColorCheckBox.defaultValue)
+		self.wasapiCheckBox.SetValue(self.wasapiCheckBox.defaultValue)
 		self.logCategoriesList.CheckedItems = self.logCategoriesList.defaultCheckedItems
 		self.playErrorSoundCombo.SetSelection(self.playErrorSoundCombo.defaultValue)
 		self._defaultsRestored = True
@@ -3154,6 +3177,7 @@ def onSave(self):
 		config.conf["documentFormatting"]["reportTransparentColor"] = (
 			self.reportTransparentColorCheckBox.IsChecked()
 		)
+		config.conf["audio"]["wasapi"] = self.wasapiCheckBox.IsChecked()
 		config.conf["annotations"]["reportDetails"] = self.annotationsDetailsCheckBox.IsChecked()
 		config.conf["annotations"]["reportAriaDescription"] = self.ariaDescCheckBox.IsChecked()
 		config.conf["braille"]["enableHidBrailleSupport"] = self.supportHidBrailleCombo.GetSelection()
diff --git a/source/nvwave.py b/source/nvwave.py
index 08a4fd19202..61c3ce8ed73 100644
--- a/source/nvwave.py
+++ b/source/nvwave.py
@@ -12,6 +12,7 @@
 from typing import (
 	Optional,
 	Callable,
+	NamedTuple,
 )
 from ctypes import (
 	windll,
@@ -21,6 +22,10 @@
 	create_unicode_buffer,
 	sizeof,
 	byref,
+	c_void_p,
+	CFUNCTYPE,
+	string_at,
+	c_float,
 )
 from ctypes.wintypes import (
 	HANDLE,
@@ -31,7 +36,10 @@
 	UINT,
 	LPUINT
 )
+from comtypes import HRESULT, BSTR, GUID
+from comtypes.hresult import S_OK
 import atexit
+import weakref
 import garbageHandler
 import winKernel
 import wave
@@ -39,6 +47,7 @@
 from logHandler import log
 import os.path
 import extensionPoints
+import NVDAHelper
 
 
 __all__ = (
@@ -140,7 +149,29 @@ def _isDebugForNvWave():
 	return config.conf["debugLog"]["nvwave"]
 
 
-class WavePlayer(garbageHandler.TrackedObject):
+class AudioSession(NamedTuple):
+	"""Identifies an audio session.
+	An audio session may contain multiple streams. The guid identifies the
+	session. The name is shown in the system Volume Mixer.
+	"""
+	guid: GUID
+	name: str
+
+
+#: The audio session to use by default.
+defaultSession = AudioSession(
+	GUID("{C302B781-00AF-4ECC-ACB7-7DF16AF7D55E}"),
+	"NVDA"
+)
+#: The audio session to use for sounds.
+soundsSession = AudioSession(
+	GUID("{A560CE90-E9D9-44AF-8C3C-0D9734642D48}"),
+	# Translators: Shown in the system Volume Mixer for controlling NVDA sounds.
+	_("NVDA sounds")
+)
+
+
+class WinmmWavePlayer(garbageHandler.TrackedObject):
 	"""Synchronously play a stream of audio.
 	To use, construct an instance and feed it waveform audio using L{feed}.
 	Keeps device open until it is either not available, or WavePlayer is explicitly closed / deleted.
@@ -188,7 +219,8 @@ def __init__(
 			outputDevice: typing.Union[str, int] = WAVE_MAPPER,
 			closeWhenIdle: bool = False,
 			wantDucking: bool = True,
-			buffered: bool = False
+			buffered: bool = False,
+			session: AudioSession = defaultSession,
 		):
 		"""Constructor.
 		@param channels: The number of channels of audio; e.g. 2 for stereo, 1 for mono.
@@ -337,7 +369,8 @@ def open(self):
 
 	def feed(
 			self,
-			data: bytes,
+			data: typing.Union[bytes, c_void_p],
+			size: typing.Optional[int] = None,
 			onDone: typing.Optional[typing.Callable] = None
 	) -> None:
 		"""Feed a chunk of audio data to be played.
@@ -348,9 +381,13 @@ def feed(
 		This allows for uninterrupted playback as long as a new chunk is fed before
 		the previous chunk has finished playing.
 		@param data: Waveform audio in the format specified when this instance was constructed.
+		@param size: The size of the data in bytes if data is a ctypes pointer.
+			If data is a Python bytes object, size should be None.
 		@param onDone: Function to call when this chunk has finished playing.
 		@raise WindowsError: If there was an error playing the audio.
 		"""
+		if size is not None:
+			data = string_at(data, size)
 		if not self._minBufferSize:
 			self._feedUnbuffered_handleErrors(data, onDone=onDone)
 			return
@@ -583,6 +620,8 @@ def _safe_winmm_call(
 			return False
 
 
+WavePlayer = WinmmWavePlayer
+
 def _getOutputDevices():
 	"""Generator, returning device ID and device Name in device ID order.
 		@note: Depending on number of devices being fetched, this may take some time (~3ms)
@@ -675,19 +714,31 @@ def playWaveFile(
 		samplesPerSec=f.getframerate(),
 		bitsPerSample=f.getsampwidth() * 8,
 		outputDevice=config.conf["speech"]["outputDevice"],
-		wantDucking=False
+		wantDucking=False,
+		session=soundsSession
 	)
-	fileWavePlayer.feed(f.readframes(f.getnframes()))
+
+	def play():
+		global fileWavePlayer
+		fileWavePlayer.feed(f.readframes(f.getnframes()))
+		fileWavePlayer.idle()
+		# #11169: Files might not be played that often. Leaving the device open
+		# until the next file is played really shouldn't be a problem regardless of
+		# how long we wait, but closing the device seems to hang occasionally.
+		# There's no benefit to keeping it open - we're going to create a new
+		# player for the next file anyway - so just destroy it now.
+		fileWavePlayer = None
+
 	if asynchronous:
 		if fileWavePlayerThread is not None:
 			fileWavePlayerThread.join()
 		fileWavePlayerThread = threading.Thread(
 			name=f"{__name__}.playWaveFile({os.path.basename(fileName)})",
-			target=fileWavePlayer.idle
+			target=play
 		)
 		fileWavePlayerThread.start()
 	else:
-		fileWavePlayer.idle()
+		play()
 
 # When exiting, ensure fileWavePlayer is deleted before modules get cleaned up.
 # Otherwise, WavePlayer.__del__ will fail with an exception.
@@ -700,3 +751,254 @@ def _cleanup():
 
 def isInError() -> bool:
 	return WavePlayer.audioDeviceError_static
+
+
+wasPlay_callback = CFUNCTYPE(None, c_void_p, c_uint)
+
+
+def _wasPlay_errcheck(res, func, args):
+	if res != S_OK:
+		raise WindowsError(res)
+
+
+class WasapiWavePlayer(garbageHandler.TrackedObject):
+	"""Synchronously play a stream of audio using WASAPI.
+	To use, construct an instance and feed it waveform audio using L{feed}.
+	Keeps device open until it is either not available, or WavePlayer is explicitly closed / deleted.
+	Will attempt to use the preferred device, if not will fallback to the default device.
+	"""
+	#: Static variable, if any one WavePlayer instance is in error due to a missing / changing audio device
+	# the error applies to all instances
+	audioDeviceError_static: bool = False
+	#: Maps C++ WasapiPlayer instances to Python WasapiWavePlayer instances.
+	#: This allows us to have a single callback in the class rather than on
+	#: each instance, which prevents reference cycles.
+	_instances = weakref.WeakValueDictionary()
+
+	def __init__(
+			self,
+			channels: int,
+			samplesPerSec: int,
+			bitsPerSample: int,
+			outputDevice: typing.Union[str, int] = WAVE_MAPPER,
+			closeWhenIdle: bool = False,
+			wantDucking: bool = True,
+			buffered: bool = False,
+			session: AudioSession = defaultSession,
+	):
+		"""Constructor.
+		@param channels: The number of channels of audio; e.g. 2 for stereo, 1 for mono.
+		@param samplesPerSec: Samples per second (hz).
+		@param bitsPerSample: The number of bits per sample.
+		@param outputDevice: The name of the audio output device to use,
+			WAVE_MAPPER for default.
+		@param closeWhenIdle: Deprecated; ignored.
+		@param wantDucking: if true then background audio will be ducked on Windows 8 and higher
+		@param buffered: Whether to buffer small chunks of audio to prevent audio glitches.
+		@param session: The audio session which should be used.
+		@note: If C{outputDevice} is a name and no such device exists, the default device will be used.
+		@raise WindowsError: If there was an error opening the audio output device.
+		"""
+		self.channels = channels
+		self.samplesPerSec = samplesPerSec
+		self.bitsPerSample = bitsPerSample
+		format = self._format = WAVEFORMATEX()
+		format.wFormatTag = WAVE_FORMAT_PCM
+		format.nChannels = channels
+		format.nSamplesPerSec = samplesPerSec
+		format.wBitsPerSample = bitsPerSample
+		format.nBlockAlign: int = bitsPerSample // 8 * channels
+		format.nAvgBytesPerSec = samplesPerSec * format.nBlockAlign
+		self._audioDucker = None
+		if wantDucking:
+			import audioDucking
+			if audioDucking.isAudioDuckingSupported():
+				self._audioDucker = audioDucking.AudioDucker()
+		self._player = NVDAHelper.localLib.wasPlay_create(
+			self._deviceNameToId(outputDevice),
+			format,
+			WasapiWavePlayer._callback, session.guid, session.name)
+		self._doneCallbacks = {}
+		self._instances[self._player] = self
+		self.open()
+
+	@wasPlay_callback
+	def _callback(cppPlayer, feedId):
+		pyPlayer = WasapiWavePlayer._instances[cppPlayer]
+		onDone = pyPlayer._doneCallbacks.pop(feedId, None)
+		if onDone:
+			onDone()
+
+	def __del__(self):
+		if not hasattr(self, "_player"):
+			# This instance failed to construct properly. Let it die gracefully.
+			return
+		if not NVDAHelper.localLib:
+			# This instance is dying after NVDAHelper was terminated. We can't
+			# destroy it in that case, but we're probably exiting anyway.
+			return
+		if self._player:
+			NVDAHelper.localLib.wasPlay_destroy(self._player)
+			del self._instances[self._player]
+			self._player = None
+
+	def open(self):
+		"""Open the output device.
+		This will be called automatically when required.
+		It is not an error if the output device is already open.
+		"""
+		try:
+			NVDAHelper.localLib.wasPlay_open(self._player)
+		except WindowsError:
+			log.warning(
+				"Couldn't open specified or default audio device. "
+				"There may be no audio devices."
+			)
+			WavePlayer.audioDeviceError_static = True
+			raise
+		WasapiWavePlayer.audioDeviceError_static = False
+
+	def close(self):
+		"""For WASAPI, this just stops playback.
+		"""
+		self.stop()
+
+	def feed(
+			self,
+			data: typing.Union[bytes, c_void_p],
+			size: typing.Optional[int] = None,
+			onDone: typing.Optional[typing.Callable] = None
+	) -> None:
+		"""Feed a chunk of audio data to be played.
+		This will block until there is sufficient space in the buffer.
+		However, it will return well before the audio is finished playing.
+		This allows for uninterrupted playback as long as a new chunk is fed before
+		the previous chunk has finished playing.
+		@param data: Waveform audio in the format specified when this instance was constructed.
+		@param size: The size of the data in bytes if data is a ctypes pointer.
+			If data is a Python bytes object, size should be None.
+		@param onDone: Function to call when this chunk has finished playing.
+		@raise WindowsError: If there was an error playing the audio.
+		"""
+		if self._audioDucker:
+			self._audioDucker.enable()
+		feedId = c_uint() if onDone else None
+		NVDAHelper.localLib.wasPlay_feed(
+			self._player,
+			data,
+			size if size is not None else len(data),
+			byref(feedId) if onDone else None
+		)
+		if onDone:
+			self._doneCallbacks[feedId.value] = onDone
+
+	def sync(self):
+		"""Synchronise with playback.
+		This method blocks until the previously fed chunk of audio has finished playing.
+		"""
+		NVDAHelper.localLib.wasPlay_sync(self._player)
+
+	def idle(self):
+		"""Indicate that this player is now idle; i.e. the current continuous segment  of audio is complete.
+		For WASAPI, this just calls L{sync}.
+		"""
+		self.sync()
+		if self._audioDucker:
+			self._audioDucker.disable()
+
+	def stop(self):
+		"""Stop playback.
+		"""
+		if self._audioDucker:
+			self._audioDucker.disable()
+		NVDAHelper.localLib.wasPlay_stop(self._player)
+		self._doneCallbacks = {}
+
+	def pause(self, switch: bool):
+		"""Pause or unpause playback.
+		@param switch: C{True} to pause playback, C{False} to unpause.
+		"""
+		if self._audioDucker:
+			if switch:
+				self._audioDucker.disable()
+			else:
+				self._audioDucker.enable()
+		if switch:
+			NVDAHelper.localLib.wasPlay_pause(self._player)
+		else:
+			NVDAHelper.localLib.wasPlay_resume(self._player)
+
+	def setSessionVolume(self, level: float):
+		"""Set the volume for the audio session.
+		This sets the volume for all streams in this session, not just the stream
+		associated with this WavePlayer instance.
+		"""
+		NVDAHelper.localLib.wasPlay_setSessionVolume(self._player, c_float(level))
+
+	@staticmethod
+	def _getDevices():
+		rawDevs = BSTR()
+		NVDAHelper.localLib.wasPlay_getDevices(byref(rawDevs))
+		chunkIter = iter(rawDevs.value.split("\0"))
+		while True:
+			devId = next(chunkIter)
+			if not devId:
+				break  # Final null.
+			name = next(chunkIter)
+			yield devId, name
+
+	@staticmethod
+	def _deviceNameToId(name, fallbackToDefault=True):
+		if name == WAVE_MAPPER:
+			return ""
+		for devId, devName in WasapiWavePlayer._getDevices():
+			# WinMM device names are truncated to MAXPNAMELEN characters, so we must
+			# use startswith.
+			if devName.startswith(name):
+				return devId
+		# Check if this is the WinMM sound mapper device, which means default.
+		if name == next(_getOutputDevices())[1]:
+			return ""
+		if fallbackToDefault:
+			return ""
+		raise LookupError
+
+
+def initialize():
+	global WavePlayer
+	if not config.conf["audio"]["wasapi"]:
+		return
+	WavePlayer = WasapiWavePlayer
+	NVDAHelper.localLib.wasPlay_create.restype = c_void_p
+	for func in (
+		NVDAHelper.localLib.wasPlay_startup,
+		NVDAHelper.localLib.wasPlay_open,
+		NVDAHelper.localLib.wasPlay_feed,
+		NVDAHelper.localLib.wasPlay_stop,
+		NVDAHelper.localLib.wasPlay_sync,
+		NVDAHelper.localLib.wasPlay_pause,
+		NVDAHelper.localLib.wasPlay_resume,
+		NVDAHelper.localLib.wasPlay_setSessionVolume,
+		NVDAHelper.localLib.wasPlay_getDevices,
+	):
+		func.restype = HRESULT
+		func.errcheck = _wasPlay_errcheck
+	NVDAHelper.localLib.wasPlay_startup()
+	try:
+		# Some audio clients won't specify a session; e.g. speech synthesizers which
+		# use their own audio output code rather than nvwave. We don't want these to
+		# end up in the wrong session, so we set a specific default session. To do
+		# that, first create a stream in that session (defaultSession).
+		WasapiWavePlayer(channels=1, samplesPerSec=44100, bitsPerSample=16)
+		# Now create a stream with the null session (GUID_NULL). This will use the
+		# session we created above. All subsequent streams created without a specific
+		# session will use this session.
+		WasapiWavePlayer(
+			channels=1,
+			samplesPerSec=44100,
+			bitsPerSample=16,
+			session=AudioSession(GUID(), "")
+		)
+	except WindowsError:
+		# There are probably no audio devices. Ignore this so NVDA can still start.
+		log.warning("Unable to set default audio session; couldn't open device")
diff --git a/source/synthDrivers/_espeak.py b/source/synthDrivers/_espeak.py
index e26501b2128..fecef65f8a7 100755
--- a/source/synthDrivers/_espeak.py
+++ b/source/synthDrivers/_espeak.py
@@ -8,7 +8,7 @@
 import nvwave
 import threading
 import queue
-from ctypes import cdll
+from ctypes import cdll, CFUNCTYPE, c_int, c_void_p, POINTER, sizeof, c_short
 from ctypes import *
 import config
 import globalVars
@@ -138,7 +138,8 @@ def encodeEspeakString(text):
 def decodeEspeakString(data):
 	return data.decode('utf8')
 
-t_espeak_callback=CFUNCTYPE(c_int,POINTER(c_short),c_int,POINTER(espeak_EVENT))
+
+t_espeak_callback = CFUNCTYPE(c_int, c_void_p, c_int, POINTER(espeak_EVENT))
 
 @t_espeak_callback
 def callback(wav,numsamples,event):
@@ -167,16 +168,24 @@ def callback(wav,numsamples,event):
 			onIndexReached(None)
 			isSpeaking = False
 			return CALLBACK_CONTINUE_SYNTHESIS
-		wav = string_at(wav, numsamples * sizeof(c_short)) if numsamples>0 else b""
 		prevByte = 0
+		length = numsamples * sizeof(c_short)
 		for indexNum, indexByte in indexes:
-			player.feed(wav[prevByte:indexByte],
-				onDone=lambda indexNum=indexNum: onIndexReached(indexNum))
+			# Sometimes, rate boost can result in spurious index values.
+			if indexByte < 0:
+				indexByte = 0
+			elif indexByte > length:
+				indexByte = length
+			player.feed(
+				c_void_p(wav + prevByte),
+				size=indexByte - prevByte,
+				onDone=lambda indexNum=indexNum: onIndexReached(indexNum)
+			)
 			prevByte = indexByte
 			if not isSpeaking:
 				return CALLBACK_ABORT_SYNTHESIS
-		player.feed(wav[prevByte:])
-		_numBytesPushed += len(wav)
+		player.feed(c_void_p(wav + prevByte), size=length - prevByte)
+		_numBytesPushed += length
 		return CALLBACK_CONTINUE_SYNTHESIS
 	except:
 		log.error("callback", exc_info=True)
diff --git a/source/synthDrivers/oneCore.py b/source/synthDrivers/oneCore.py
index 4d2cd20d8fb..7302fa70d83 100644
--- a/source/synthDrivers/oneCore.py
+++ b/source/synthDrivers/oneCore.py
@@ -54,6 +54,7 @@
 
 #: The number of 100-nanosecond units in 1 second.
 HUNDRED_NS_PER_SEC = 10000000 # 1000000000 ns per sec / 100 ns
+WAVE_HEADER_LENGTH = 46
 ocSpeech_Callback = ctypes.CFUNCTYPE(None, ctypes.c_void_p, ctypes.c_int, ctypes.c_wchar_p)
 
 class _OcSsmlConverter(speechXml.SsmlConverter):
@@ -451,10 +452,11 @@ def _callback(self, bytes, len, markers):
 		else:
 			self._consecutiveSpeechFailures = 0
 		# This gets called in a background thread.
-		stream = io.BytesIO(ctypes.string_at(bytes, len))
+		stream = io.BytesIO(ctypes.string_at(bytes, WAVE_HEADER_LENGTH))
 		wav = wave.open(stream, "r")
 		self._maybeInitPlayer(wav)
-		data = wav.readframes(wav.getnframes())
+		data = bytes + WAVE_HEADER_LENGTH
+		dataLen = wav.getnframes() * wav.getnchannels() * wav.getsampwidth()
 		if markers:
 			markers = markers.split('|')
 		else:
@@ -473,14 +475,17 @@ def _callback(self, bytes, len, markers):
 			# Order the equation so we don't have to do floating point.
 			pos = pos * self._bytesPerSec // HUNDRED_NS_PER_SEC
 			# Push audio up to this marker.
-			self._player.feed(data[prevPos:pos],
-				onDone=lambda index=index: synthIndexReached.notify(synth=self, index=index))
+			self._player.feed(
+				ctypes.c_void_p(data + prevPos),
+				size=pos - prevPos,
+				onDone=lambda index=index: synthIndexReached.notify(synth=self, index=index)
+			)
 			prevPos = pos
 		if self._wasCancelled:
 			if isDebugForSynthDriver():
 				log.debug("Cancelled, stopped pushing audio")
 		else:
-			self._player.feed(data[prevPos:])
+			self._player.feed(ctypes.c_void_p(data + prevPos), size=dataLen - prevPos)
 			if isDebugForSynthDriver():
 				log.debug("Done pushing audio")
 		self._processQueue()
diff --git a/source/tones.py b/source/tones.py
index a894b1c34c3..754327a1e44 100644
--- a/source/tones.py
+++ b/source/tones.py
@@ -26,7 +26,8 @@ def initialize():
 			samplesPerSec=int(SAMPLE_RATE),
 			bitsPerSample=16,
 			outputDevice=config.conf["speech"]["outputDevice"],
-			wantDucking=False
+			wantDucking=False,
+			session=nvwave.soundsSession
 		)
 	except Exception:
 		log.warning("Failed to initialize audio for tones", exc_info=True)
diff --git a/user_docs/en/changes.t2t b/user_docs/en/changes.t2t
index e6058563a78..a594bb3afd2 100644
--- a/user_docs/en/changes.t2t
+++ b/user_docs/en/changes.t2t
@@ -20,6 +20,7 @@ What's New in NVDA
 - When pressing ``numpad2`` three times to report the numerical value of the character at the position of the review cursor, the information is now also provided in braille. (#14826)
 - Added gestures for Tivomatic Caiku Albatross Braille displays.
 There are now gestures for showing the braille settings dialog, accessing the status bar, cycling the braille cursor shape, and toggling the braille cursor on/off. (#14844)
+- NVDA now outputs audio via the Windows Audio Session API (WASAPI), which may improve the responsiveness, performance and stability of NVDA speech and sounds. This can be disabled in Advanced settings if audio problems are encountered. (#14697)
 - 
 
 
diff --git a/user_docs/en/userGuide.t2t b/user_docs/en/userGuide.t2t
index 718ed69292c..d3ac48ed23f 100644
--- a/user_docs/en/userGuide.t2t
+++ b/user_docs/en/userGuide.t2t
@@ -2293,6 +2293,12 @@ Some GDI applications will highlight text with a background color, NVDA (via dis
 In some situations, the text background may be entirely transparent, with the text layered on some other GUI element.
 With several historically popular GUI APIs, the text may be rendered with a transparent background, but visually the background color is accurate.
 
+==== Use WASAPI for audio output ====[WASAPI]
+This option enables audio output via the Windows Audio Session API (WASAPI).
+WASAPI is a more modern audio framework which may improve the responsiveness, performance and stability of NVDA audio output, including both speech and sounds.
+This option is enabled by default.
+After changing this option, you will need to restart NVDA for the change to take effect.
+
 ==== Debug logging categories ====[AdvancedSettingsDebugLoggingCategories]
 The checkboxes in this list allow you to enable specific categories of debug messages in NVDA's log.
 Logging these messages can result in decreased performance and large log files.