Javascript or WASM port ? #76

feuler · 2023-11-03T21:20:17Z

Hi,

the FAQs of openWakeWord says about Javascript support: "This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest"

So that's what i'm doing here now... because I'm interested in such feature (for my Voice recognition web interface).

I've done my best to solve this myself. Tried to modify and Emscripten a openWakeWord-cpp WASM (which crashes on load). Also tried to modify existing onnxruntime VAD solutions (like SileroVAD or "rajashekar/WakeWordDetector"). But there i'll have to completely modify re-implement tensor inputs/outputs etc. which is currently above my abilities.

So, if you could implement support for javascript that would be really nice !
Or maybe you have information what i could use as alternative for wake detection in javascript (with my own onnx wake word model) ?

(as alternative i'll try to implement the wake detection with openWakeWord via websockets for the microphone audio between my web app and python flask)

dscripka · 2023-11-06T12:35:57Z

Thanks for opening this issue, this is something I've thought more about and so far there are three main barriers to an (easy) implementation.

Running openWakeWord in practice requires more than just running the ONNX or tflite models. There is audio input and processing, state management, multiple model orchestration, etc. This would all have to be implemented in Javascript as well.
Javascript doesn't have native support for working with multi-dimensional arrays, which to your point, makes preparing data for ONNX complex.
There may be performance impacts with running the models in the browser. In theory, as long as WebAssembly or WebGL support is present in the browser this shouldn't be a large impact, but in practice this is another point of failure.

So with all that in mind, if your goal is to use a web interface to collect an audio stream and monitor it for wake words, by far the easiest option is to do as you describe an create a small python application with websockets to receive data from the web app.

If it would help, I could create some example scripts that demonstrate this workflow as an interim solution while I continue to consider a full Javascript port.

feuler · 2023-11-06T21:18:39Z

Thanks for the reply and the information !
I also figured that just running the onnx models is not working without the orchestration...

At the moment i'm using this:
https://www.npmjs.com/package/bumblebee-hotword-node
Which is the only well working (offline, non cloud/register) solution i found in javascript.

I also fine-tuned another wake model like described here: https://www.rajashekar.org/wake-word/
Which didn't lead to a model with good precision (in my case).

I've started to implement openwakeword via aiohttp websocket. But haven't gotten it to work yet.
So, it would be very kind of you, if you could provide example scripts for implementation via websocket.

dscripka · 2023-11-10T01:30:56Z

@feuler the new examples for streaming audio from a browser to openWakeWord over websockets have been added in a recent PR, you can find them under examples/web.

I hope this is useful!

feuler · 2023-11-11T16:25:23Z

@dscripka Thank you very much !!
Shall i close this topic ?

dscripka · 2023-11-13T12:37:37Z

Yes, if this example works for your current application we can close the issue.

hadfield · 2024-04-03T17:14:57Z

In case someone is following up on the need for a completely browser-based version, there is a very rough but working version in the repo:
https://github.com/vital-ai/vital-wakeword-js

this uses onnx runtime web (javascript) for the models, pyodide for the python code, and runs within a web worker.

the project structure is very crude with the javascript --> python --> javascript flow.

it could be greatly improved and simplified by refactoring the python code into just javascript/typescript, but this was a quick way to try it out. only the "runtime" code would need to be converted since training would still be occurring on the python side.

there is a wake word model trained for the phrase "Hey Haley", which isn't very robust yet, used in a demo in this repo:
https://github.com/chat-ai-app/chat-ai-assistant-demo

which is deployed here:
https://demo-voice.chat.ai

This is used in combination with the Whisper tiny model to convert speech-to-text in the browser. A rough proof-of-concept implementation was used for that as well.

jbflow · 2024-08-29T02:59:54Z

I know this issue is closed but I'm attempting to run the project in the browser for wake word detection and running into some issues. I've successfuly got the examples working in python on my ubuntu vm machine in UTM. Im trying to build something that will run in the browser easily. It looks like it can be done but will need some tinkering. So I'm suggesting this issue be reopened or moved to a discussion so that we can figure something out here.

The comment by @hadfield above looks promising - was unable to get the demo https://demo-voice.chat.ai/ working and looking at the repo I can't get my head around how to get it running locally myself. Some assistance in the readme would be greatly appreciated.

hadfield · 2024-08-29T03:19:14Z

The demo website does work in current versions of Chrome and Firefox, perhaps others. You should see activity in the console log. Turning the switch "on" should start listening for the phrase "hey haley" and then whisper will be used to transcribe what follows until it detects end of speech/silence.
There are some notes on the whisper model part of it here:
https://blog.vital.ai/2024/04/03/running-whisper-speech-to-text-model-in-the-browser/

jbflow · 2024-08-29T03:29:58Z

Ah awesome thank you and my apologies for the oversight it does seem to be working fine and capturing output in the console. I will give the blog post a read as well. I'm interested in contributing to this work where I can as I feel that something needs to be done to reduce the overhead cost of running voice activated AI assistants.

At the moment if I could just get a wake word working in the browser this would be a great start for my current project. I'm keen to move over to whisper for the full speech to text side of things but at the moment I'm just using the web speech API, but I'd like to be able to run my app on the edge and small foot print hardware devices as well. Privacy, latency and bandwidth concerns are my main priorities.

Thank you for your quick response.

hadfield · 2024-08-29T04:05:36Z

The code is pretty rough as its smooshed together javascript and python with pyodide, but you should be able to train whatever wake word model you want with openWakeWord and swap it in. I'd be happy if someone were to clean up the code a bit. 😄

jbflow · 2024-08-29T04:10:11Z

Thank you, yeah I'm taking a look now, I've forked your repo, will open an issue or discussion over there with the problems im coming up against - will most likely be tomorrow if not some time next week by the time I get a proper look.

dscripka mentioned this issue Nov 10, 2023

Example updates #78

Merged

dscripka closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Javascript or WASM port ? #76

Javascript or WASM port ? #76

feuler commented Nov 3, 2023 •

edited

Loading

dscripka commented Nov 6, 2023

feuler commented Nov 6, 2023

dscripka commented Nov 10, 2023

feuler commented Nov 11, 2023

dscripka commented Nov 13, 2023

hadfield commented Apr 3, 2024

jbflow commented Aug 29, 2024

hadfield commented Aug 29, 2024

jbflow commented Aug 29, 2024

hadfield commented Aug 29, 2024

jbflow commented Aug 29, 2024

Javascript or WASM port ? #76

Javascript or WASM port ? #76

Comments

feuler commented Nov 3, 2023 • edited Loading

dscripka commented Nov 6, 2023

feuler commented Nov 6, 2023

dscripka commented Nov 10, 2023

feuler commented Nov 11, 2023

dscripka commented Nov 13, 2023

hadfield commented Apr 3, 2024

jbflow commented Aug 29, 2024

hadfield commented Aug 29, 2024

jbflow commented Aug 29, 2024

hadfield commented Aug 29, 2024

jbflow commented Aug 29, 2024

feuler commented Nov 3, 2023 •

edited

Loading