Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example updates #78

Merged
merged 5 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ openWakeWord is an open-source wakeword library that can be used to create voice

# Updates

**2023/11/09**
- Added example scripts under `examples/web` that demonstrate streaming audio from a web application into openWakeWord.

**2023/10/11**
- Significant improvements to the process of [training new models](#training-new-models), including an example Google Colab notebook demonstrating how to train a basic wake word model in <1 hour.

Expand Down Expand Up @@ -240,9 +243,11 @@ Future release road maps may have non-english support. In particular, [Mycroft.A

**Can openWakeWord be run in a browser with javascript?**
- While the ONNX runtime [does support javascript](https://onnxruntime.ai/docs/get-started/with-javascript.html), much of the other functionality required for openWakeWord models would need to be ported. This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest.
- As a potential work-around for some applications, the example scripts in `examples/web` demonstrate how audio can be captured in a browser and streaming via websockets into openWakeWord running in a Python backend server.
- Other potential options could include projects like `pyodide` (see [here](https://github.com/pyodide/pyodide/issues/4220)) for a related issue.

**Is there a C++ version of openWakeWord?**
- While the ONNX runtime [also has a C++ API](https://onnxruntime.ai/docs/get-started/with-cpp.html), there isn't an official C++ implementation of the full openWakeWord library. However, [@synesthesiam](https://github.com/synesthesiam) has created a [C++ version](https://github.com/rhasspy/openWakeWord-cpp) of openWakeWord with basic functionality implemented.
- While the ONNX runtime [also has a C++ API](https://onnxruntime.ai/docs/get-started/with-cpp.html), there isn't an official C++ implementation of the full openWakeWord library. However, [@synesthesiam](https://github.com/synesthesiam) has created a [C++ version of openWakeWord](https://github.com/rhasspy/openWakeWord-cpp) with basic functionality implemented.

**Why are there three separate models instead of just one?**
- Separating the models was an intentional choice to provide flexibility and optimize the efficiency of the end-to-end prediction process. For example, with separate melspectrogram, embedding, and prediction models, each one can operate on different size inputs of audio to optimize overall latency and share computations between models. It certainly is possible to make a combined model with all of the steps integrated, though, if that was a requirement of a particular use case.
Expand Down
35 changes: 26 additions & 9 deletions examples/capture_activations.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,26 @@
default=False,
required=False
)
parser=argparse.ArgumentParser()
parser.add_argument(
"--chunk_size",
help="How much audio (in number of 16khz samples) to predict on at once",
type=int,
default=1280,
required=False
)
parser.add_argument(
"--model_path",
help="The path of a specific model to load",
type=str,
default="",
required=False
)
parser.add_argument(
"--model",
help="The model to use for openWakeWord, leave blank to use all available models",
"--inference_framework",
help="The inference framework to use (either 'onnx' or 'tflite'",
type=str,
default='tflite',
required=False
)
parser.add_argument(
Expand All @@ -87,25 +103,26 @@
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1280
CHUNK = args.chunk_size
audio = pyaudio.PyAudio()
mic_stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)

# Load pre-trained openwakeword models
if args.model:
if args.model_path:
model_paths = openwakeword.get_pretrained_model_paths()
for path in model_paths:
if args.model in path:
if args.model_path in path:
model_path = path

if model_path:
owwModel = Model(
wakeword_model_paths=[model_path],
wakeword_models=[model_path],
enable_speex_noise_suppression=args.noise_suppression,
vad_threshold = args.vad_threshold
)
vad_threshold = args.vad_threshold,
inference_framework=args.inference_framework
)
else:
print(f'Could not find model \"{args.model}\"')
print(f'Could not find model \"{args.model_path}\"')
exit()
else:
owwModel = Model(
Expand Down
21 changes: 21 additions & 0 deletions examples/web/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Examples

This folder contains examples of using openWakeWord with web applications.

## Websocket Streaming

As openWakeWord does not have a native Javascript port, using it within a web browswer is best accomplished with websocket streaming of the audio data from the browser to a simple Python application. To install the requirements for this example:

```
pip install aiohttp
pip install resampy
```

The `streaming_client.html` page shows a simple implementation of audio capture and streamimng from a microphone and streaming in a browser, and the `streaming_server.py` file is the corresponding websocket server that passes the audio into openWakeWord.

To run the example, execute `python streaming_server.py` (add the `--help` argument to see options) and navigate to `localhost:9000` in your browser.

Note that this example is illustrative only, and integration of this approach with other web applications may have different requirements. In particular, some key considerations:

- This example captures PCM audio from the web browser and streams full 16-bit integer representations of ~250 ms audio chunks over the websocket connection. In practice, bandwidth efficient streams of compressed audio may be more suitable for some applications.
- The browser captures audio at the native sampling rate of the capture device, which can require re-sampling prior to passing the audio data to openWakeWord. This example uses the `resampy` library which has a good balance between performance and quality, but other resampling approaches that optimize different aspects may be more suitable for some applications.
197 changes: 197 additions & 0 deletions examples/web/streaming_client.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Websocket Microphone Streaming</title>
<style>
body {
text-align: center;
font-family: 'Roboto', sans-serif;
}
#startButton {
padding: 15px 30px;
font-size: 18px;
background-color: #03A9F4;
border: none;
border-radius: 4px;
color: white;
cursor: pointer;
outline: none;
transition: background-color 0.3s;
}
#startButton.listening {
background-color: #4CAF50;
}
table {
margin: 20px auto;
border-collapse: collapse;
width: 60%;
}
th, td {
border: 1px solid #E0E0E0;
padding: 10px;
text-align: left;
}
th {
background-color: #F5F5F5;
}

@keyframes fadeOut {
from {
opacity: 1;
}
to {
opacity: 0;
}
}

.detected-animation {
animation: fadeOut 2s forwards;
}
</style>
</head>
<body>
<h1>Streaming Audio to openWakeWord Using Websockets</h1>
<button id="startButton">Start Listening</button>

<table>
<tr>
<th>Wakeword</th>
<th>Detected</th>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>

<script>
// Create websocket connection
const ws = new WebSocket('ws://localhost:9000/ws');

// When the websocket connection is open
ws.onopen = function() {
console.log('WebSocket connection is open');
};

// Get responses from websocket and display information
ws.onmessage = (event) => {
console.log(event.data);
const model_payload = JSON.parse(event.data);
if ("loaded_models" in model_payload) {
// Add loaded models to the rows of the first column in the table, inserting rows as needed
const table = document.querySelector('table');
const rows = table.querySelectorAll('tr');
for (let i = 1; i < model_payload.loaded_models.length + 1; i++) {
if (i < rows.length) {
const row = rows[i];
const cell = row.querySelectorAll('td')[0];
cell.textContent = model_payload.loaded_models[i - 1];
} else {
// Insert extra rows if needed, both column 1 and 2
const row = table.insertRow();
const cell1 = row.insertCell();
const cell2 = row.insertCell();
cell1.textContent = model_payload.loaded_models[i - 1];
cell2.textContent = '';
}
}

}

if ("activations" in model_payload) {
// Add detected wakeword to the rows of the second column in the table
const table = document.querySelector('table');
const rows = table.querySelectorAll('tr');
for (let i = 1; i < rows.length; i++) {
// Check for the model name in the first column and add "Detected!" to the second column if they match
if (model_payload.activations.includes(rows[i].querySelectorAll('td')[0].textContent)) {
const cell = rows[i].querySelectorAll('td')[1];
cell.textContent = "Detected!";
cell.classList.add('detected-animation'); // animate fade out

// Remove the CSS class after the fade out animation ends to reset the state
cell.addEventListener('animationend', () => {
cell.textContent = '';
cell.classList.remove('detected-animation');
}, { once: true });
}
}
}
};

// Create microphone capture stream for 16-bit PCM audio data
// Code based on the excellent tutorial by Ragy Morkas: https://medium.com/@ragymorkos/gettineg-monochannel-16-bit-signed-integer-pcm-audio-samples-from-the-microphone-in-the-browser-8d4abf81164d
navigator.getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia ||
navigator.msGetUserMedia;

let audioStream;
let audioContext;
let recorder;
let volume;
let sampleRate;

if (navigator.getUserMedia) {
navigator.getUserMedia({audio: true}, function(stream) {
audioStream = stream;

// creates the an instance of audioContext
const context = window.AudioContext || window.webkitAudioContext;
audioContext = new context();

// retrieve the current sample rate of microphone the browser is using and send to Python server
sampleRate = audioContext.sampleRate;

// creates a gain node
volume = audioContext.createGain();

// creates an audio node from the microphone incoming stream
const audioInput = audioContext.createMediaStreamSource(audioStream);

// connect the stream to the gain node
audioInput.connect(volume);

const bufferSize = 4096;
recorder = (audioContext.createScriptProcessor ||
audioContext.createJavaScriptNode).call(audioContext,
bufferSize,
1,
1);

recorder.onaudioprocess = function(event) {
const samples = event.inputBuffer.getChannelData(0);
const PCM16iSamples = samples.map(sample => {
let val = Math.floor(32767 * sample);
return Math.min(32767, Math.max(-32768, val));
});

// Push audio to websocket
const int16Array = new Int16Array(PCM16iSamples);
const blob = new Blob([int16Array], { type: 'application/octet-stream' });
ws.send(blob);
};

}, function(error) {
alert('Error capturing audio.');
});
} else {
alert('getUserMedia not supported in this browser.');
}

// start recording
const startButton = document.getElementById('startButton');
startButton.addEventListener('click', function() {
if (!startButton.classList.contains('listening')) {
volume.connect(recorder);
recorder.connect(audioContext.destination);
ws.send(sampleRate);
startButton.classList.add('listening');
startButton.textContent = 'Listening...';
}
});
</script>
</body>
</html>
Loading