Audio/Video Search Project

A project that enables searching through audio and video files by identifying and displaying the segments where similar content is spoken. This project utilizes forced alignment, LexRank, speech recognition for auto-captioning, and text embeddings.

Introduction

This project aims to enhance the accessibility and usability of audio and video content by allowing users to search for specific phrases or topics within the media. By leveraging advanced techniques such as forced alignment and text embeddings, the project provides an efficient way to locate relevant segments and auto-generate captions.

Installation

To set up the project, follow these steps:

# Clone the repository
git clone https://github.com/username/audio-video-search.git
cd audio-video-search

# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows use venv\Scripts�ctivate

# Install required packages
pip install -r requirements.txt

Usage

To use the project, run the main script that processes audio/video files and allows for searching through them:

python main.py --input <path_to_audio_or_video> --query "<search_phrase>"

This command will output the segments where the specified phrase is spoken along with the corresponding timestamps.

Project Overview

Key Features

Forced Alignment: Aligns the spoken words with the audio/video timeline to accurately identify when phrases are spoken.
Speech Recognition: Automatically generates captions for audio/video files using speech recognition technology.
LexRank: Implements the LexRank algorithm to summarize and rank the most relevant segments based on the search query.
Text Embeddings: Uses text embeddings to enhance the search capabilities, allowing for semantic understanding of the queries.

Workflow

Input Processing: Users provide an audio or video file along with a search query.
Auto-Captioning: The audio is processed to generate captions using speech recognition.
Forced Alignment: The generated captions are aligned with the audio to create a timeline.
Search Execution: The system uses LexRank and text embeddings to find and rank relevant segments.
Output: The relevant timestamps and segments are displayed to the user.

Technologies Used

Forced Alignment: Gentle or similar tools for aligning audio with text.
Speech Recognition: SpeechRecognition library for generating captions.
LexRank: Implementation of the LexRank algorithm for text summarization.
Text Embeddings: Use of models like BERT or Sentence Transformers for semantic search.

Contributing

Contributions are welcome! If you would like to contribute to this project, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and add new features or improvements.
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
EDA.ipynb		EDA.ipynb
Final_project_common_voice_dataset.ipynb		Final_project_common_voice_dataset.ipynb
README.md		README.md
Search_query (2).ipynb		Search_query (2).ipynb
Search_query.ipynb		Search_query.ipynb
forced_alignment.ipynb		forced_alignment.ipynb
lexrank.ipynb		lexrank.ipynb
speech_recognition_pipeline.ipynb		speech_recognition_pipeline.ipynb
text_embedder.ipynb		text_embedder.ipynb
texts.txt		texts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio/Video Search Project

Table of Contents

Introduction

Installation

Usage

Project Overview

Key Features

Workflow

Technologies Used

Contributing

About

Releases

Packages

Contributors 2

Languages

osamacoooooll/voice-search

Folders and files

Latest commit

History

Repository files navigation

Audio/Video Search Project

Table of Contents

Introduction

Installation

Usage

Project Overview

Key Features

Workflow

Technologies Used

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages