Realtime onscreen transcription with Whisper

Transcribe your speech or the audio playing on your computer with Whisper in realtime, and show the captions on your screen.

demo.mp4

Installation

Install the following packages:

$ pip install -r requirement.txt

Usage

Transcribe your speech

Run the following command to start transcribing your speech:

$ python transcribe.py --input-provider speech-recognition --model base --no-faster-whisper

A list of audio input device will be displayed. Choose your microphone to start transcribing.

Some options:

You can choose the --input-provider from "speech-recognition" and "pyaudio". The difference is speech-recognition will surpress silence input.
To run with faster whisper, omit the --no-faster-whisper option. Note for Cuda 12.x, you need to update your LD_LIBRARY_PATH, see Troubleshoot - 1.
For better percision, use the --language option to specify the input language (in ISO 639-1 codes).

Transcribe the audio output on your computer

(Optional) Setup monitor device for audio output.

Setup a loopback device for your audio output. Skip this step if you already setup a monitor device in other way.

First list available devices for monitoring (this will also list your microphone).
```
$ pactl list sources short
```
Example output:
```
2   alsa_input.microphone   module-alsa-card.c    s16le 1ch 44100Hz   SUSPENDED
30   alsa_output.hdmi-stereo.monitor   module-alsa-card.c    s16le 2ch 44100Hz RUNNING
```
Then set the pulse source to your chosen device, for example "alsa_output.hdmi-stereo.monitor".
```
$ export PULSE_SOURCE=alsa_output.hdmi-stereo.monitor
```

Start transcribing.

For transcribing from the device chosen in step 1, use "pulse" as input.

$ python transcribe.py --input pulse --input-provider speech-recognition --model base --no-faster-whisper

Troubleshoot

To run with faster whisper with Cuda 12.x, udpate your LD_LIBRARY_PATH as follow.

$ export pyvenv_path=YOUR_VENV_PATH # e.g. $HOME/.pyvenv/onscreen-transcription
$ export pyvenv_py_version=YOUR_PYTHON_VERSION # python3.10
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$pyvenv_path/lib64/$pyvenv_py_version/site-packages/nvidia/cublas/lib:$pyvenv_path/lib64/$pyvenv_py_version/site-packages/nvidia/cudnn/lib

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime onscreen transcription with Whisper

Installation

Usage

Transcribe your speech

Transcribe the audio output on your computer

Troubleshoot

About

Releases

Packages

Languages

License

Donny-Hikari/realtime-transcribe

Folders and files

Latest commit

History

Repository files navigation

Realtime onscreen transcription with Whisper

Installation

Usage

Transcribe your speech

Transcribe the audio output on your computer

Troubleshoot

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages