Voice Activity Detection

Brief About Voice Activity Detection (VAD)

Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth.

VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language.

Package Used in this Services

Check requirements.txt for more details if something error

[VAD] webrtcvad - explore here
[VAD] pyaudio - explore here
[STT] whisper - explore here
[QUEUE] pika - explore here
[QUEUE] rabbitmq - explore here
[API] websockets - explore here

Services Brief (About Service VAD) on `vad.py`

This code will monitor voice activity by using 1 (Voice Activity Detected), _ (No Voice Activity), and X (No Voice Activity Detected for N seconds). after X comes up it will stop or doing some sample activity (time.sleep(5)) and save the recorded frames to .wav audio file.

Simple Use Cases of Voice Activity Detection

Voice Activity Detection (vad)

in folder vad, i create an implementation of vad for web service using websockets. you can explore it on folder vad/vad-websockets.

NOTE:
I got some difficulties in the client part at the javascript (index.html). Vad that i set receiving 16000hz audio with 320 buffersize, but the client send 44100hz audio with automatic value of buffersize detected by device. I found some example code to downsample the audio to 16000hz but not with the buffersize, so i decide to handle it in the server side by adding silent voice b'\x00' to the audio frame if the length is not compatible. if there are optional way please share to me, it would be my pleasure ‎😃.
- vad.py
Speech to Text with Voice Activity Detection (vad-stt)
- vad-stt.py
Voice Bot (vad-stt-chatbot)
- vad-stt-chatbot.py
Live Transcription (vad-stt-transcription)

This service need rabbitmq installed for queuing the audio before transcription. run vad-stt-transcription-worker.py and vad-stt-transcription-show.py first. then you can run vad-stt-transcription.py
- vad-stt-transcription.py (Recorder)
- vad-stt-transcription-worker.py (Service for generate and transcribe audio)
- vad-stt-transcription-show.py (Monitor Transcription)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
assets		assets
vad-stt-chatbot		vad-stt-chatbot
vad-stt-transcription		vad-stt-transcription
vad-stt		vad-stt
vad		vad
.gitignore		.gitignore
Readme.md		Readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Activity Detection

Brief About Voice Activity Detection (VAD)

Package Used in this Services

Services Brief (About Service VAD) on `vad.py`

Simple Use Cases of Voice Activity Detection

About

Releases

Packages

Languages

hanifabd/voice-activity-detection-vad-realtime

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detection

Brief About Voice Activity Detection (VAD)

Package Used in this Services

Services Brief (About Service VAD) on vad.py

Simple Use Cases of Voice Activity Detection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Services Brief (About Service VAD) on `vad.py`

Packages