Code Repository: https://github.com/gsu-library/whisper-scribe
Author: Matt Brooks mbrooks34@gsu.edu
Date Created: 2024-05-21
License: GPLv3
Version: 1.2.0
WhisperScribe is a Django-powered web application that simplifies audio analysis by using AI for speech recognition (Faster Whisper) and speaker diarization (Pyannote.Audio). Users can upload or link media, generate accurate transcripts with speaker identification, and easily edit the results. This project also leverages CUDA support for quicker processing.
- Python v3.10.12
- FFmpeg
- Web Server
- NVIDIA drivers (if using CUDA)
The following installation instructions are based on a Linux install. This mostly works in a Windows environment with some extra configuration.
- Install Python. We recommend using version 3.10.12, as that is what this repository is built on. If you need to manage multiple Python versions, we suggest using Pyenv.
- Install FFmpeg.
- Install and configure a web server for static and media file hosting. This can also be used as a reverse proxy server to proxy Gunicorn. Either Apache or Nginx are recommended.
- Either clone the WhisperScribe git repository or download the source code from the latest release. Move/extract the files in a location that is not being served by a web server.
- Create a Python virtual environment inside the WhisperScribe folder - venv is recommended.
- Activate the Python virtual envrionment (stay in the virtual environment for the remainder of the steps). Once activated install the pip requirements:
pip install -r requirements-freeze.txt
. - Copy the core/settings.sample.py file to core/settings.py and configure the settings file. If wanting to use a database other than SQLite configure it now (see Django's databases documentation).
- Run Django database migrations:
python manage.py migrate
. - Create the cache table:
python manage.py createcachetable
. - Move static files:
python manage.py collectstatic
. - Install NVIDIA drivers if using CUDA (optional).
- Create Django admin user (optional):
python manage.py createsuperuser
.
A web server will have to be configured to host static and media files used by WhisperScribe. Django has documentation on how to deploy static files.
The SECRET_KEY and ALLOWED_HOST fields must be configured before running WhisperScribe. It is recommended to also take a look at the rest of the configurations in the settings file. See Django settings reference for additional information. If troubleshooting is needed for setup/configuration DEBUG can be enabled. DO NOT LEAVE THIS ENABLED IN A PRODUCTION ENVIRONMENT!
SECRET_KEY - REQUIRED
Run the following command while within the WhisperScribe Python virtual environment to generate a secret key: python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())'
ALLOWED_HOSTS - REQUIRED
A list of strings representing the host/domain names that this Django site can serve.
CSRF_TRUSTED_ORIGINS
If using a reverse proxy to Gunicorn this will have to be set to Gunicorn's bind address. See CSRF trusted origins for more information.
HUGGING_FACE_TOKEN
This is required to use diarization. In order to create a token you must:
- Accept pyannote/segmentation-3.0 user conditions,
- accept pyannote/speaker-diarization-3.1 user conditions,
- and create an access token at hf.co/settings/tokens.
UPPERCASE_SPEAKER_NAMES
If speaker names should be in uppercase or not in file downloads.
MAX_SEGMENT_LENGTH
The default max number of characters per segment.
MAX_SEGMENT_TIME
The default max length of segments in seconds.
WHISPER_LANGUAGE
The default for the langauge spoken in the audio. Set to None or '' for auto detection as a default.
WHISPER_MODELS
The list of models available to Whisper (tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3). See https://huggingface.co/Systran.
WHISPER_MODEL_DEFAULT
The default whisper model to show (from the list of WHISPER_MODELS).
USE_DJANGO_Q
Whether to use Django Q or not. This may cause issues in a Windows environemnt. If disabled the WhisperScribe interface will hang while processing audio.
DATABASES
Configure what kind of database you want to use. The default is SQLite. See https://docs.djangoproject.com/en/5.1/ref/settings/#databases and https://docs.djangoproject.com/en/5.1/ref/databases/.
TIME_ZONE
Set to your local time zone.
MEDIA_URL
The URL that handles the media served from MEDIA_ROOT. This must end in a slash.
MEDIA_ROOT
The absolute filesystem path to the directory that will store the media files.
STATIC_URL
The URL to use when referring to the static files located in STATIC_ROOT. This must end in a slash.
STATIC_ROOT
The absolute filesystem path to the directory where the collectstatic command will move static files for deployment.
The NVIDIA drivers available will depend on the OS and the video card installed. Ubuntu provides a helpful article that goes over searching for and installing NVIDIDA drivers. We have had success on our setup using the nvidida-driver-535-server package.
To connect WhisperScribe to a MySQL database a MySQL pip package, headers, and libraries will have to be installed. The mysqlclient pip package is recommended. The installation instructions can be found on the mysqlclient pypi.org page.
Use the following commands to start the Django application and to run Django Q (within the Python virtual environment). If Django Q is disabled in the settings file the qcluster command does not need to be included.
gunicorn core.wsgi
python manage.py qcluster
If wanting to run Gunicorn on a port other than 8000 the -b
flag can be passed to set the bind address and port.
The systemd service can be used to run WhisperScribe on Linux operating systems. To set this up first copy both the whisperscribe.sample.service and whisperscribe-q.sample.service files to whisperscribe.service and whisperscribe-q.service respectively. Then edit both copied files to update the paths for WorkingDirectory, Environment, and ExecStart. For all three make sure the absolute path to WhisperScribe is used and for the Environment and ExecStart directives make sure the name of the virtual environment folder is correct. Also make sure the path for Environment includes the correct version of Python. Once configured the files can be added to systemd with the following commands. You will need to edit the command to use the path to your instance of WhisperScribe.
sudo systemctl enable /path/to/whisperscribe/whisperscribe.service
sudo systemctl enable /path/to/whisperscribe/whisperscribe-q.service
Once both services are enabled WhisperScribe will start automatically during normal boot. WhisperScribe can also be started, stopped, and restarted with the following commands.
sudo systemctl start whisperscribe
sudo systemctl stop whisperscribe
sudo systemctl restart whisperscribe
Check the CHANGELOG and release notes to see if there are any major changes with the core/settings.sample.py file, if a migration is required, if the requirements-freeze.txt pip packages file has been updated, or if static files need to be migrated.
It never hurts to run the commands below after an update (while in the Python virtual environment).
pip install -r requirements-freeze.txt
python manage.py migrate
python manage.py collectstatic
At some point you will want to reverse proxy a web server to WhisperScribe in order to use SSL certificates. Apache and NGINX provide well documented guides on setting up reverse proxies. Gunicorn also provides a guide on setting up a reverse proxy using Nginx. Do note that if using a reverse proxy server some additional settings will need to be adjusted such as max post size.
The Django project folder is 'core' and the application folder is 'webui'.