Multi-Modal Assistant: Speech Recognition and Speaker Identification

This repository contains the code for the speech recognition and speaker identification components of the multi-modal digital assistant developed for Project 2.2.

Technologies Used

Java: For basic audio processing tasks using TarsosDSP and JavaSound API.
Python: For implementing speech recognition and speaker identification models.

Models

Wav2Vec-2.0: Utilized for high-accuracy transcription of raw audio. This model by Facebook/Meta uses self-supervised learning to achieve effective speech recognition.
Mozilla DeepSpeech: Considered for its two-step process combining a deep neural network with an N-gram language model to convert audio into text.

Speaker Identification

Mel-frequency Cepstral Coefficients (MFCC): Used for extracting speaker-specific acoustic features.
Gaussian Mixture Models (GMM): Employed for probabilistic modeling, enabling text-independent speaker identification by evaluating likelihood scores against these models.

Authors

Pie de Boer
Loris Podevyn

Date

June 2023

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
.jython_cache/packages		.jython_cache/packages
gradle/wrapper		gradle/wrapper
python		python
python_speakerid		python_speakerid
src/main/java/org		src/main/java/org
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
funoutput.wav		funoutput.wav
gradlew		gradlew
gradlew.bat		gradlew.bat
out.wav		out.wav
out16.wav		out16.wav
out16khz		out16khz
out16khz.wav		out16khz.wav
output0.wav		output0.wav
recording.wav		recording.wav
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Modal Assistant: Speech Recognition and Speaker Identification

Technologies Used

Models

Speaker Identification

Authors

Date

About

Releases

Packages

Contributors 2

Languages

piedeboer96/Digital-Assistant-Audio-Processing

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Assistant: Speech Recognition and Speaker Identification

Technologies Used

Models

Speaker Identification

Authors

Date

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages