Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth.
VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language.
Check
requirements.txt
for more details if something error
[VAD]
webrtcvad - explore here[VAD]
pyaudio - explore here[STT]
whisper - explore here[QUEUE]
pika - explore here[QUEUE]
rabbitmq - explore here[API]
websockets - explore here
This code will monitor voice activity by using 1 (Voice Activity Detected)
, _ (No Voice Activity)
, and X (No Voice Activity Detected for N seconds)
. after X
comes up it will stop or doing some sample activity (time.sleep(5)) and save the recorded frames to .wav audio file.
- Voice Activity Detection (vad)
in folder
vad
, i create an implementation of vad for web service using websockets. you can explore it on foldervad/vad-websockets
.
NOTE:
I got some difficulties in the client part at the javascript (index.html
). Vad that i set receiving 16000hz audio with 320 buffersize, but the client send 44100hz audio with automatic value of buffersize detected by device. I found some example code to downsample the audio to 16000hz but not with the buffersize, so i decide to handle it in the server side by adding silent voiceb'\x00'
to the audio frame if the length is not compatible. if there are optional way please share to me, it would be my pleasure 😃.- vad.py
- Speech to Text with Voice Activity Detection (vad-stt)
- vad-stt.py
- Voice Bot (vad-stt-chatbot)
- vad-stt-chatbot.py
- Live Transcription (vad-stt-transcription)
This service need
rabbitmq
installed for queuing the audio before transcription. runvad-stt-transcription-worker.py
andvad-stt-transcription-show.py
first. then you can runvad-stt-transcription.py
- vad-stt-transcription.py
(Recorder)
- vad-stt-transcription-worker.py
(Service for generate and transcribe audio)
- vad-stt-transcription-show.py
(Monitor Transcription)
- vad-stt-transcription.py