Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript or WASM port ? #76

Closed
feuler opened this issue Nov 3, 2023 · 11 comments
Closed

Javascript or WASM port ? #76

feuler opened this issue Nov 3, 2023 · 11 comments

Comments

@feuler
Copy link

feuler commented Nov 3, 2023

Hi,

the FAQs of openWakeWord says about Javascript support: "This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest"

So that's what i'm doing here now... because I'm interested in such feature (for my Voice recognition web interface).

I've done my best to solve this myself. Tried to modify and Emscripten a openWakeWord-cpp WASM (which crashes on load). Also tried to modify existing onnxruntime VAD solutions (like SileroVAD or "rajashekar/WakeWordDetector"). But there i'll have to completely modify re-implement tensor inputs/outputs etc. which is currently above my abilities.

So, if you could implement support for javascript that would be really nice !
Or maybe you have information what i could use as alternative for wake detection in javascript (with my own onnx wake word model) ?

(as alternative i'll try to implement the wake detection with openWakeWord via websockets for the microphone audio between my web app and python flask)

@dscripka
Copy link
Owner

dscripka commented Nov 6, 2023

Thanks for opening this issue, this is something I've thought more about and so far there are three main barriers to an (easy) implementation.

  1. Running openWakeWord in practice requires more than just running the ONNX or tflite models. There is audio input and processing, state management, multiple model orchestration, etc. This would all have to be implemented in Javascript as well.

  2. Javascript doesn't have native support for working with multi-dimensional arrays, which to your point, makes preparing data for ONNX complex.

  3. There may be performance impacts with running the models in the browser. In theory, as long as WebAssembly or WebGL support is present in the browser this shouldn't be a large impact, but in practice this is another point of failure.

So with all that in mind, if your goal is to use a web interface to collect an audio stream and monitor it for wake words, by far the easiest option is to do as you describe an create a small python application with websockets to receive data from the web app.

If it would help, I could create some example scripts that demonstrate this workflow as an interim solution while I continue to consider a full Javascript port.

@feuler
Copy link
Author

feuler commented Nov 6, 2023

Thanks for the reply and the information !
I also figured that just running the onnx models is not working without the orchestration...

At the moment i'm using this:
https://www.npmjs.com/package/bumblebee-hotword-node
Which is the only well working (offline, non cloud/register) solution i found in javascript.

I also fine-tuned another wake model like described here: https://www.rajashekar.org/wake-word/
Which didn't lead to a model with good precision (in my case).

I've started to implement openwakeword via aiohttp websocket. But haven't gotten it to work yet.
So, it would be very kind of you, if you could provide example scripts for implementation via websocket.

@dscripka
Copy link
Owner

@feuler the new examples for streaming audio from a browser to openWakeWord over websockets have been added in a recent PR, you can find them under examples/web.

I hope this is useful!

@feuler
Copy link
Author

feuler commented Nov 11, 2023

@dscripka Thank you very much !!
Shall i close this topic ?

@dscripka
Copy link
Owner

Yes, if this example works for your current application we can close the issue.

@hadfield
Copy link

hadfield commented Apr 3, 2024

In case someone is following up on the need for a completely browser-based version, there is a very rough but working version in the repo:
https://github.com/vital-ai/vital-wakeword-js

this uses onnx runtime web (javascript) for the models, pyodide for the python code, and runs within a web worker.

the project structure is very crude with the javascript --> python --> javascript flow.

it could be greatly improved and simplified by refactoring the python code into just javascript/typescript, but this was a quick way to try it out. only the "runtime" code would need to be converted since training would still be occurring on the python side.

there is a wake word model trained for the phrase "Hey Haley", which isn't very robust yet, used in a demo in this repo:
https://github.com/chat-ai-app/chat-ai-assistant-demo

which is deployed here:
https://demo-voice.chat.ai

This is used in combination with the Whisper tiny model to convert speech-to-text in the browser. A rough proof-of-concept implementation was used for that as well.

@jbflow
Copy link

jbflow commented Aug 29, 2024

I know this issue is closed but I'm attempting to run the project in the browser for wake word detection and running into some issues. I've successfuly got the examples working in python on my ubuntu vm machine in UTM. Im trying to build something that will run in the browser easily. It looks like it can be done but will need some tinkering. So I'm suggesting this issue be reopened or moved to a discussion so that we can figure something out here.

The comment by @hadfield above looks promising - was unable to get the demohttps://demo-voice.chat.ai/ working and looking at the repo I can't get my head around how to get it running locally myself. Some assistance in the readme would be greatly appreciated.

@hadfield
Copy link

The demo website does work in current versions of Chrome and Firefox, perhaps others. You should see activity in the console log. Turning the switch "on" should start listening for the phrase "hey haley" and then whisper will be used to transcribe what follows until it detects end of speech/silence.
There are some notes on the whisper model part of it here:
https://blog.vital.ai/2024/04/03/running-whisper-speech-to-text-model-in-the-browser/

@jbflow
Copy link

jbflow commented Aug 29, 2024

Ah awesome thank you and my apologies for the oversight it does seem to be working fine and capturing output in the console. I will give the blog post a read as well. I'm interested in contributing to this work where I can as I feel that something needs to be done to reduce the overhead cost of running voice activated AI assistants.

At the moment if I could just get a wake word working in the browser this would be a great start for my current project. I'm keen to move over to whisper for the full speech to text side of things but at the moment I'm just using the web speech API, but I'd like to be able to run my app on the edge and small foot print hardware devices as well. Privacy, latency and bandwidth concerns are my main priorities.

Thank you for your quick response.

@hadfield
Copy link

The code is pretty rough as its smooshed together javascript and python with pyodide, but you should be able to train whatever wake word model you want with openWakeWord and swap it in. I'd be happy if someone were to clean up the code a bit. 😄

@jbflow
Copy link

jbflow commented Aug 29, 2024

Thank you, yeah I'm taking a look now, I've forked your repo, will open an issue or discussion over there with the problems im coming up against - will most likely be tomorrow if not some time next week by the time I get a proper look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants