-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Javascript or WASM port ? #76
Comments
Thanks for opening this issue, this is something I've thought more about and so far there are three main barriers to an (easy) implementation.
So with all that in mind, if your goal is to use a web interface to collect an audio stream and monitor it for wake words, by far the easiest option is to do as you describe an create a small python application with websockets to receive data from the web app. If it would help, I could create some example scripts that demonstrate this workflow as an interim solution while I continue to consider a full Javascript port. |
Thanks for the reply and the information ! At the moment i'm using this: I also fine-tuned another wake model like described here: https://www.rajashekar.org/wake-word/ I've started to implement openwakeword via aiohttp websocket. But haven't gotten it to work yet. |
@feuler the new examples for streaming audio from a browser to openWakeWord over websockets have been added in a recent PR, you can find them under I hope this is useful! |
@dscripka Thank you very much !! |
Yes, if this example works for your current application we can close the issue. |
In case someone is following up on the need for a completely browser-based version, there is a very rough but working version in the repo: this uses onnx runtime web (javascript) for the models, pyodide for the python code, and runs within a web worker. the project structure is very crude with the javascript --> python --> javascript flow. it could be greatly improved and simplified by refactoring the python code into just javascript/typescript, but this was a quick way to try it out. only the "runtime" code would need to be converted since training would still be occurring on the python side. there is a wake word model trained for the phrase "Hey Haley", which isn't very robust yet, used in a demo in this repo: which is deployed here: This is used in combination with the Whisper tiny model to convert speech-to-text in the browser. A rough proof-of-concept implementation was used for that as well. |
I know this issue is closed but I'm attempting to run the project in the browser for wake word detection and running into some issues. I've successfuly got the examples working in python on my ubuntu vm machine in UTM. Im trying to build something that will run in the browser easily. It looks like it can be done but will need some tinkering. So I'm suggesting this issue be reopened or moved to a discussion so that we can figure something out here. The comment by @hadfield above looks promising - was unable to get the demohttps://demo-voice.chat.ai/ working and looking at the repo I can't get my head around how to get it running locally myself. Some assistance in the readme would be greatly appreciated. |
The demo website does work in current versions of Chrome and Firefox, perhaps others. You should see activity in the console log. Turning the switch "on" should start listening for the phrase "hey haley" and then whisper will be used to transcribe what follows until it detects end of speech/silence. |
Ah awesome thank you and my apologies for the oversight it does seem to be working fine and capturing output in the console. I will give the blog post a read as well. I'm interested in contributing to this work where I can as I feel that something needs to be done to reduce the overhead cost of running voice activated AI assistants. At the moment if I could just get a wake word working in the browser this would be a great start for my current project. I'm keen to move over to whisper for the full speech to text side of things but at the moment I'm just using the web speech API, but I'd like to be able to run my app on the edge and small foot print hardware devices as well. Privacy, latency and bandwidth concerns are my main priorities. Thank you for your quick response. |
The code is pretty rough as its smooshed together javascript and python with pyodide, but you should be able to train whatever wake word model you want with openWakeWord and swap it in. I'd be happy if someone were to clean up the code a bit. 😄 |
Thank you, yeah I'm taking a look now, I've forked your repo, will open an issue or discussion over there with the problems im coming up against - will most likely be tomorrow if not some time next week by the time I get a proper look. |
Hi,
the FAQs of openWakeWord says about Javascript support: "This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest"
So that's what i'm doing here now... because I'm interested in such feature (for my Voice recognition web interface).
I've done my best to solve this myself. Tried to modify and Emscripten a openWakeWord-cpp WASM (which crashes on load). Also tried to modify existing onnxruntime VAD solutions (like SileroVAD or "rajashekar/WakeWordDetector"). But there i'll have to completely modify re-implement tensor inputs/outputs etc. which is currently above my abilities.
So, if you could implement support for javascript that would be really nice !
Or maybe you have information what i could use as alternative for wake detection in javascript (with my own onnx wake word model) ?
(as alternative i'll try to implement the wake detection with openWakeWord via websockets for the microphone audio between my web app and python flask)
The text was updated successfully, but these errors were encountered: