This project offers a privacy-focused solution for transcribing and summarizing audio recordings through entirely local processing. Using OpenAI's Whisper for transcription and local LLMs via Ollama for summarization, it processes audio files (MP3/WAV) entirely on your machine, ensuring sensitive content never leaves your environment.
The tool automatically generates structured summaries including:
- Executive overview
- Detailed content breakdown
- Action items
- Meeting metadata
The project uses a configuration-based approach (config.yaml) for easy customization of output formats, model parameters, and summary structures, making it adaptable for various meeting types and organizational needs
- I am using Python 3.10.11
NOTE: ffmpeg
DOES NOT work in virutal environment and is required for Whisper to work. A "[Win2]File not found error" populates when attempting to use within a virtual environment, although it is not best practice, utilize your global environment instead!
-
Follow the instructions HERE and install choclatey install via PowerShell Administration to install
ffmpeg
. -
Install FFmpeg:
choco install ffmpeg
python3.X -m pip install -r requirements.txt --no-warn-script-location
- From PowerShell Administrator run the following:
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force
- If you have NVIDIA GPUs, determine what compute platform you have present:
nvidia-smi.exe
- Identify "CUDA Version"
- Navigate to: https://pytorch.org/get-started/locally/
- Select options specific to your environment and install the command specified!
- Once installation is complete run:
python3.X pytorch_verify.py
Example Successfull Output:
- True
- NVIDIA GeForce RTX 3080 Ti Laptop GPU
python3.X transcribe.py
python3.X transcribe-args.py --mode single --input-file ./path/to/audio.mp3 --output-dir ./path/to/output --model large
python3.X transcribe-args.py --mode multiple --input-dir ./path/to/audio/files --output-dir ./path/to/output --model base
-
-h
or--help
: To see all available options and examples -
--mode
: Choose between 'single' or 'multiple' REQUIREDsingle
: Transcribe one audio filemultiple
: Transcribe all audio files in a directory
-
--input-file
: Path to the audio file REQUIRED -
--input-dir
: Path to directory containing audio files REQUIRED -
--output-dir
: Directory where transcription files will be saved REQUIRED -
--model
: Choose Whisper model size (If not specified, the script uses the 'base' model by default.) OPTIONALtiny
: Fastest, lowest accuracybase
: Good balance of speed and accuracysmall
: Better accuracy, slower than basemedium
: High accuracy, slowerlarge
: Highest accuracy, slowest
- Download Ollama and run your chosen model, follow the process HERE.
- Configure and update the REQUIRED settings in the
config.yaml
python3.X summarize.py
llm:
model_name: "YOUR_MODEL_NAME" # REQUIRED
max_retries: 3 # OPTIONAL
retry_delay: 2 # OPTIONAL
api_url: "http://localhost:11434/api/generate"
options:
temperature: 0.7 # OPTIONAL
top_p: 0.9 # OPTIONAL
max_tokens: 8000 # OPTIONAL
output:
format: "md" # OPTIONAL
log_file: "./data/transcript_processor.log" # REQUIRED
paths:
input_transcript: "./data/transcriptions/" # REQUIRED
output_directory: "./data/meeting summaries" # REQUIRED
audio_file: "./data/converted-audio/" # REQUIRED
Shortened for Space...
model_name
: Choose your Ollama modelmax_retries
: Number of API call attemptsretry_delay
: Delay between retriestemperature
: Controls response creativity, higher = more creative, lower = more concise (0.0-1.0)top_p
: controls similarity sampling (accuracy) when generating a response (0.1-1)max_tokens
: Maximum response length
input_transcript
: Path to your transcript fileoutput_directory
: Directory for generated summaries
format
: Output format ('md' or 'txt')log_file
: Location of log file
executive_summary
detailed_summary
action_items
-
Navigate to
http://localhost:11434
to ensure Ollama is running. -
From terminal input the following to identify and ensure your model is downloaded successfully:
ollama list
ollama run YOUR_MODEL
- Include the use of
audio-converter.py
at the beginning to ensure that audio conversion is in either .mp3 or .wav format. - Separate
audio-converter.py
- Make
audio-converter.py
it's own project and just have the current project reference it to be used...
Incorporation of commandline overrides should they become a repetitive occurence.
# Future implementation in summarize.py
import argparse
def parse_args():
"""Parse command line arguments for path overrides."""
parser = argparse.ArgumentParser(description='Transcript Summarization Tool')
parser.add_argument(
'--input-file',
help='Override input transcript file path from config.yaml'
)
parser.add_argument(
'--output-dir',
help='Override output directory path from config.yaml'
)
return parser.parse_args()
python3.X summarize.py --input-file ./new/transcript.txt --output-dir ./new/output
config.yaml
should take command-line arguments rather than directy file manipulation- User should be able to specify the title of the summary based off of the original audio file name when placed inside of "meeting summaries" so that there is clarity.
- GOAL: Start with audio conversion (if needed) > transciption > summarization all with one command...