-
Notifications
You must be signed in to change notification settings - Fork 0
aXlireza/Python-Audio-command-recognizer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Version 0.1.6 # Updates New Model layers plus more and improved dataset with an enhanced realtime program with activation key command # How to use You can run the realtime.py file already which won't require you any other action, it will start displaying a table of commands and how confident the model is that the voice had the specific command (the highest will be highlighted by an asterisk) # Dataset on this version, the main dataset is from http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip with some custom commands datasets(which got very few samples so not doing much good on 'em) You can Download the original dataset into the data folder and copy the content of data/custom_commands into the data/mini_speech_commands directory # Roadmap - The create_model.py file is where I train the model and store it in a keras file - The record_audio.py records a single second voice - The record_sequence.py records every second into a new file and is stored in the path defined in the file in sequence of 0.wav, 1.wav, 2.wav, ... - The test_model.py, you can set a couple of wav files for test (1 second) - finally the readltime.py opens the mic and chups the input into 1sec sequnces and puts each seconds into prediction and prints the likelihood of each command in the voice # The problems to solve in near future - Higher quality model # TODO - Voice Activation Command - Pickup specific voice frequency - Recognize all the voice frequencies in the room and what they say(but don't run the command unless allows by master) - Multi step commands(like how it waits for next command after the activation key command) # Challenges - On the realtime.py where I try to records small cunks of data and put them back together to form a 1sec sound and feed it directly to the model(without saving it to a file) - It wouldn't give the same quality apparently to the model to predict correctly - Making too much small chunks would make the prediction repeat many many times while only one word was said - Realized in the dataset, having to many samples for a command rather than small amount of samples for another one, makes an unfair playground for the other cammands(that have fewer samples) and bias the prediction toward the larger command sample set - When music or other sounds are setting noise, it wouldn't understand me and would make sense of the song speech, which is not intended => So => I'm trying to filter out any frequencies rather my voice frequency which requires to understand my frequency first
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published