Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
KoljaB authored Nov 15, 2024
2 parents b52be1a + d2fe7cf commit 691b072
Show file tree
Hide file tree
Showing 22 changed files with 5,464 additions and 360 deletions.
168 changes: 163 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@

# RealtimeSTT
[![PyPI](https://img.shields.io/pypi/v/RealtimeSTT)](https://pypi.org/project/RealtimeSTT/)
[![Downloads](https://static.pepy.tech/badge/RealtimeSTT)](https://pepy.tech/project/KoljaB/RealtimeSTT)
[![GitHub release](https://img.shields.io/github/release/KoljaB/RealtimeSTT.svg)](https://GitHub.com/KoljaB/RealtimeSTT/releases/)
[![GitHub commits](https://badgen.net/github/commits/KoljaB/RealtimeSTT)](https://GitHub.com/Naereen/KoljaB/RealtimeSTT/commit/)
[![GitHub forks](https://img.shields.io/github/forks/KoljaB/RealtimeSTT.svg?style=social&label=Fork&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/network/)
[![GitHub stars](https://img.shields.io/github/stars/KoljaB/RealtimeSTT.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/stargazers/)

*Easy-to-use, low-latency speech-to-text library for realtime applications*

## New

Custom wake words with [OpenWakeWord](#openwakeword). Thanks to the [developers](https://github.com/dscripka/openWakeWord) of this!
- AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
- reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)

## About the Project

Expand All @@ -18,16 +24,53 @@ It's ideal for:
- **Voice Assistants**
- Applications requiring **fast and precise** speech-to-text conversion

https://github.com/KoljaB/RealtimeSTT/assets/7604638/207cb9a2-4482-48e7-9d2b-0722c3ee6d14
https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

### Updates

Latest Version: v0.2.41
Latest Version: v0.3.7

See [release history](https://github.com/KoljaB/RealtimeSTT/releases).

> **Hint:** *Since we use the `multiprocessing` module now, ensure to include the `if __name__ == '__main__':` protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the [official Python documentation on `multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming).*
## Quick Examples

### Print everything being said:

```python
from RealtimeSTT import AudioToTextRecorder
import pyautogui

def process_text(text):
print(text)

if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()

while True:
recorder.text(process_text)
```

### Type everything being said:

```python
from RealtimeSTT import AudioToTextRecorder
import pyautogui

def process_text(text):
pyautogui.typewrite(text + " ")

if __name__ == '__main__':
print("Wait until it says 'speak now'")
recorder = AudioToTextRecorder()

while True:
recorder.text(process_text)
```
*Will type everything being said into your selected text box*

### Features

- **Voice Activity Detection**: Automatically detects when you start and stop speaking.
Expand Down Expand Up @@ -158,6 +201,19 @@ recorder.stop()
print(recorder.text())
```

#### Standalone Example:

```python
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
recorder = AudioToTextRecorder()
recorder.start()
input("Press Enter to stop recording...")
recorder.stop()
print("Transcription: ", recorder.text())
```
### Automatic Recording
Recording based on voice activity detection.
Expand All @@ -167,8 +223,19 @@ with AudioToTextRecorder() as recorder:
print(recorder.text())
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
with AudioToTextRecorder() as recorder:
print("Transcription: ", recorder.text())
```
When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:
```python
def process_text(text):
print (text)
Expand All @@ -177,6 +244,21 @@ while True:
recorder.text(process_text)
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
def process_text(text):
print(text)
if __name__ == '__main__':
recorder = AudioToTextRecorder()
while True:
recorder.text(process_text)
```
### Wakewords
Keyword activation before detecting voice. Write the comma-separated list of your desired activation keywords into the wake_words parameter. You can choose wake words from these list: alexa, americano, blueberry, bumblebee, computer, grapefruits, grasshopper, hey google, hey siri, jarvis, ok google, picovoice, porcupine, terminator.
Expand All @@ -188,6 +270,18 @@ print('Say "Jarvis" then speak.')
print(recorder.text())
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
recorder = AudioToTextRecorder(wake_words="jarvis")
print('Say "Jarvis" to start recording.')
print(recorder.text())
```
### Callbacks
You can set callback functions to be executed on different events (see [Configuration](#configuration)) :
Expand All @@ -203,6 +297,22 @@ recorder = AudioToTextRecorder(on_recording_start=my_start_callback,
on_recording_stop=my_stop_callback)
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
def start_callback():
print("Recording started!")
def stop_callback():
print("Recording stopped!")
if __name__ == '__main__':
recorder = AudioToTextRecorder(on_recording_start=start_callback,
on_recording_stop=stop_callback)
```
### Feed chunks
If you don't want to use the local microphone set use_microphone parameter to false and provide raw PCM audiochunks in 16-bit mono (samplerate 16000) with this method:
Expand All @@ -211,6 +321,20 @@ If you don't want to use the local microphone set use_microphone parameter to fa
recorder.feed_audio(audio_chunk)
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
recorder = AudioToTextRecorder(use_microphone=False)
with open("audio_chunk.pcm", "rb") as f:
audio_chunk = f.read()
recorder.feed_audio(audio_chunk)
print("Transcription: ", recorder.text())
```
### Shutdown
You can shutdown the recorder safely by using the context manager protocol:
Expand All @@ -220,12 +344,25 @@ with AudioToTextRecorder() as recorder:
[...]
```
Or you can call the shutdown method manually (if using "with" is not feasible):
```python
recorder.shutdown()
```
#### Standalone Example:
```python
from RealtimeSTT import AudioToTextRecorder
if __name__ == '__main__':
with AudioToTextRecorder() as recorder:
[...]
# or manually shutdown if "with" is not used
recorder.shutdown()
```
## Testing the Library
The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.
Expand Down Expand Up @@ -298,6 +435,8 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
- **level** (int, default=logging.WARNING): Logging level.
- **init_logging** (bool, default=True): Whether to initialize the logging framework. Set to False to manage this yourself.
- **handle_buffer_overflow** (bool, default=True): If set, the system will log a warning when an input overflow occurs during recording and remove the data from the buffer.
- **beam_size** (int, default=5): The beam size to use for beam search decoding.
Expand All @@ -310,6 +449,14 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
- **debug_mode** (bool, default=False): If set, the system prints additional debug information to the console.
- **print_transcription_time** (bool, default=False): Logs the processing time of the main model transcription. This can be useful for performance monitoring and debugging.
- **early_transcription_on_silence** (int, default=0): If set, the system will transcribe audio faster when silence is detected. Transcription will start after the specified milliseconds. Keep this value lower than `post_speech_silence_duration`, ideally around `post_speech_silence_duration` minus the estimated transcription time with the main model. If silence lasts longer than `post_speech_silence_duration`, the recording is stopped, and the transcription is submitted. If voice activity resumes within this period, the transcription is discarded. This results in faster final transcriptions at the cost of additional GPU load due to some unnecessary final transcriptions.
- **allowed_latency_limit** (int, default=100): Specifies the maximum number of unprocessed chunks in the queue before discarding chunks. This helps prevent the system from being overwhelmed and losing responsiveness in real-time applications.
- **no_log_file** (bool, default=False): If set, the system will skip writing the debug log file, reducing disk I/O. Useful if logging to a file is not needed and performance is a priority.
#### Real-time Transcription Parameters
> **Note**: *When enabling realtime description a GPU installation is strongly advised. Using realtime transcription may create high GPU loads.*
Expand Down Expand Up @@ -404,6 +551,17 @@ Suggested starting parameters for OpenWakeWord usage:
) as recorder:
```
## FAQ
### Q: I encountered the following error: "Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor." How do I fix this?
**A:** This issue arises from a mismatch between the version of `ctranslate2` and cuDNN. The `ctranslate2` library was updated to version 4.5.0, which uses cuDNN 9.2. There are two ways to resolve this issue:
1. **Downgrade `ctranslate2` to version 4.4.0**:
```bash
pip install ctranslate2==4.4.0
```
2. **Upgrade cuDNN** on your system to version 9.2 or above.
## Contribution
Contributions are always welcome!
Expand All @@ -412,7 +570,7 @@ Shoutout to [Steven Linn](https://github.com/stevenlafl) for providing docker su
## License
MIT
[MIT](https://github.com/KoljaB/RealtimeSTT?tab=MIT-1-ov-file)
## Author
Expand Down
3 changes: 2 additions & 1 deletion RealtimeSTT/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .audio_recorder import AudioToTextRecorder
from .audio_recorder import AudioToTextRecorder
from .audio_recorder_client import AudioToTextRecorderClient
Loading

0 comments on commit 691b072

Please sign in to comment.