Merge branch 'master' into master

KoljaB · Nov 15, 2024 · 691b072 · 691b072
2 parents b52be1a + d2fe7cf
commit 691b072
Show file tree

Hide file tree

Showing 22 changed files with 5,464 additions and 360 deletions.
diff --git a/README.md b/README.md
@@ -1,11 +1,17 @@
-
 # RealtimeSTT
+[![PyPI](https://img.shields.io/pypi/v/RealtimeSTT)](https://pypi.org/project/RealtimeSTT/)
+[![Downloads](https://static.pepy.tech/badge/RealtimeSTT)](https://pepy.tech/project/KoljaB/RealtimeSTT)
+[![GitHub release](https://img.shields.io/github/release/KoljaB/RealtimeSTT.svg)](https://GitHub.com/KoljaB/RealtimeSTT/releases/)
+[![GitHub commits](https://badgen.net/github/commits/KoljaB/RealtimeSTT)](https://GitHub.com/Naereen/KoljaB/RealtimeSTT/commit/)
+[![GitHub forks](https://img.shields.io/github/forks/KoljaB/RealtimeSTT.svg?style=social&label=Fork&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/network/)
+[![GitHub stars](https://img.shields.io/github/stars/KoljaB/RealtimeSTT.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/KoljaB/RealtimeSTT/stargazers/)
 
 *Easy-to-use, low-latency speech-to-text library for realtime applications*
 
 ## New
 
-Custom wake words with [OpenWakeWord](#openwakeword). Thanks to the [developers](https://github.com/dscripka/openWakeWord) of this!
+- AudioToTextRecorderClient class, which automatically starts a server if none is running and connects to it. The class shares the same interface as AudioToTextRecorder, making it easy to upgrade or switch between the two. (Work in progress, most parameters and callbacks of AudioToTextRecorder are already implemented into AudioToTextRecorderClient, but not all. Also the server can not handle concurrent (parallel) requests yet.)
+- reworked CLI interface ("stt-server" to start the server, "stt" to start the client, look at "server" folder for more info)
 
 ## About the Project
 
@@ -18,16 +24,53 @@ It's ideal for:
 - **Voice Assistants**
 - Applications requiring **fast and precise** speech-to-text conversion
 
-https://github.com/KoljaB/RealtimeSTT/assets/7604638/207cb9a2-4482-48e7-9d2b-0722c3ee6d14
+https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5
 
 ### Updates
 
-Latest Version: v0.2.41
+Latest Version: v0.3.7
 
 See [release history](https://github.com/KoljaB/RealtimeSTT/releases).
 
 > **Hint:** *Since we use the `multiprocessing` module now, ensure to include the `if __name__ == '__main__':` protection in your code to prevent unexpected behavior, especially on platforms like Windows. For a detailed explanation on why this is important, visit the [official Python documentation on `multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming).*
 
+## Quick Examples
+
+### Print everything being said:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+import pyautogui
+
+def process_text(text):
+    print(text)
+
+if __name__ == '__main__':
+    print("Wait until it says 'speak now'")
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+
+### Type everything being said:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+import pyautogui
+
+def process_text(text):
+    pyautogui.typewrite(text + " ")
+
+if __name__ == '__main__':
+    print("Wait until it says 'speak now'")
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+*Will type everything being said into your selected text box*
+
 ### Features
 
 - **Voice Activity Detection**: Automatically detects when you start and stop speaking.
@@ -158,6 +201,19 @@ recorder.stop()
 print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder()
+    recorder.start()
+    input("Press Enter to stop recording...")
+    recorder.stop()
+    print("Transcription: ", recorder.text())
+```
+
 ### Automatic Recording
 
 Recording based on voice activity detection.
@@ -167,8 +223,19 @@ with AudioToTextRecorder() as recorder:
     print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    with AudioToTextRecorder() as recorder:
+        print("Transcription: ", recorder.text())
+```
+
 When running recorder.text in a loop it is recommended to use a callback, allowing the transcription to be run asynchronously:
 
+
 ```python
 def process_text(text):
     print (text)
@@ -177,6 +244,21 @@ while True:
     recorder.text(process_text)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+def process_text(text):
+    print(text)
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder()
+
+    while True:
+        recorder.text(process_text)
+```
+
 ### Wakewords
 
 Keyword activation before detecting voice. Write the comma-separated list of your desired activation keywords into the wake_words parameter. You can choose wake words from these list: alexa, americano, blueberry, bumblebee, computer, grapefruits, grasshopper, hey google, hey siri, jarvis, ok google, picovoice, porcupine, terminator. 
@@ -188,6 +270,18 @@ print('Say "Jarvis" then speak.')
 print(recorder.text())
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(wake_words="jarvis")
+
+    print('Say "Jarvis" to start recording.')
+    print(recorder.text())
+```
+
 ### Callbacks
 
 You can set callback functions to be executed on different events (see [Configuration](#configuration)) :
@@ -203,6 +297,22 @@ recorder = AudioToTextRecorder(on_recording_start=my_start_callback,
                                on_recording_stop=my_stop_callback)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+def start_callback():
+    print("Recording started!")
+
+def stop_callback():
+    print("Recording stopped!")
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(on_recording_start=start_callback,
+                                   on_recording_stop=stop_callback)
+```
+
 ### Feed chunks
 
 If you don't want to use the local microphone set use_microphone parameter to false and provide raw PCM audiochunks in 16-bit mono (samplerate 16000) with this method:
@@ -211,6 +321,20 @@ If you don't want to use the local microphone set use_microphone parameter to fa
 recorder.feed_audio(audio_chunk)
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    recorder = AudioToTextRecorder(use_microphone=False)
+    with open("audio_chunk.pcm", "rb") as f:
+        audio_chunk = f.read()
+
+    recorder.feed_audio(audio_chunk)
+    print("Transcription: ", recorder.text())
+```
+
 ### Shutdown
 
 You can shutdown the recorder safely by using the context manager protocol:
@@ -220,12 +344,25 @@ with AudioToTextRecorder() as recorder:
     [...]
 ```
 
+
 Or you can call the shutdown method manually (if using "with" is not feasible):
 
 ```python
 recorder.shutdown()
 ```
 
+#### Standalone Example:
+
+```python
+from RealtimeSTT import AudioToTextRecorder
+
+if __name__ == '__main__':
+    with AudioToTextRecorder() as recorder:
+        [...]
+    # or manually shutdown if "with" is not used
+    recorder.shutdown()
+```
+
 ## Testing the Library
 
 The test subdirectory contains a set of scripts to help you evaluate and understand the capabilities of the RealtimeTTS library.
@@ -298,6 +435,8 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
 
 - **level** (int, default=logging.WARNING): Logging level.
 
+- **init_logging** (bool, default=True): Whether to initialize the logging framework. Set to False to manage this yourself.
+
 - **handle_buffer_overflow** (bool, default=True): If set, the system will log a warning when an input overflow occurs during recording and remove the data from the buffer.
 
 - **beam_size** (int, default=5): The beam size to use for beam search decoding.
@@ -310,6 +449,14 @@ When you initialize the `AudioToTextRecorder` class, you have various options to
 
 - **debug_mode** (bool, default=False): If set, the system prints additional debug information to the console.
 
+- **print_transcription_time** (bool, default=False): Logs the processing time of the main model transcription. This can be useful for performance monitoring and debugging.
+
+- **early_transcription_on_silence** (int, default=0): If set, the system will transcribe audio faster when silence is detected. Transcription will start after the specified milliseconds. Keep this value lower than `post_speech_silence_duration`, ideally around `post_speech_silence_duration` minus the estimated transcription time with the main model. If silence lasts longer than `post_speech_silence_duration`, the recording is stopped, and the transcription is submitted. If voice activity resumes within this period, the transcription is discarded. This results in faster final transcriptions at the cost of additional GPU load due to some unnecessary final transcriptions.
+
+- **allowed_latency_limit** (int, default=100): Specifies the maximum number of unprocessed chunks in the queue before discarding chunks. This helps prevent the system from being overwhelmed and losing responsiveness in real-time applications.
+
+- **no_log_file** (bool, default=False): If set, the system will skip writing the debug log file, reducing disk I/O. Useful if logging to a file is not needed and performance is a priority.
+
 #### Real-time Transcription Parameters
 
 > **Note**: *When enabling realtime description a GPU installation is strongly advised. Using realtime transcription may create high GPU loads.*
@@ -404,6 +551,17 @@ Suggested starting parameters for OpenWakeWord usage:
         ) as recorder:
 ```
 
+## FAQ
+
+### Q: I encountered the following error: "Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor." How do I fix this?
+
+**A:** This issue arises from a mismatch between the version of `ctranslate2` and cuDNN. The `ctranslate2` library was updated to version 4.5.0, which uses cuDNN 9.2. There are two ways to resolve this issue:
+1. **Downgrade `ctranslate2` to version 4.4.0**:
+   ```bash
+   pip install ctranslate2==4.4.0
+   ```
+2. **Upgrade cuDNN** on your system to version 9.2 or above.
+
 ## Contribution
 
 Contributions are always welcome! 
@@ -412,7 +570,7 @@ Shoutout to [Steven Linn](https://github.com/stevenlafl) for providing docker su
 
 ## License
 
-MIT
+[MIT](https://github.com/KoljaB/RealtimeSTT?tab=MIT-1-ov-file)
 
 ## Author
 

diff --git a/RealtimeSTT/__init__.py b/RealtimeSTT/__init__.py
@@ -1 +1,2 @@
-from .audio_recorder import AudioToTextRecorder
+from .audio_recorder import AudioToTextRecorder
+from .audio_recorder_client import AudioToTextRecorderClient