Releases: KoljaB/RealtimeTTS
Releases · KoljaB/RealtimeTTS
v0.4.21
RealtimeTTS v0.4.21 Release Notes
🚀 New Features
- update to latest versions of dependencies (stream2sentence, coqui-tts, elevenlabs, openai, edge-tts)
StyleTTS Engine
- Added seed. Added fix for a styletts2 problem causing noise to be generated with very short texts, especially when using embedding_scale values > 1
🛠 Bug Fixes
- Fixed a problem in stream2sentence causing minimum_sentence_length to not be respected
v0.4.20 🌿
RealtimeTTS v0.4.20 Release Notes
🚀 New Features
Azure Engine
- Added support for 48 kHz audio output in the Azure TTS engine for improved audio quality (and providing more flexibility in audio formats).
StyleTTS Engine
- introduced StyleTTSVoice for dynamic voice switching to allow transitions between multiple voice models
🛠 Bug Fixes
- Fixed incorrect voice initialization when switching between models in the StyleTTS engine.
- Fixed model configuration path issues during runtime when updating voice parameters.
v0.4.19
v0.4.17
v0.4.14
fixes #223
Enhancements to Sentence Processing
- Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
- Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering
force_first_fragment_after_words
, causing processing errors. - Increased the default
force_first_fragment_after_words
threshold from 15 to 30 for better fragment control.
v0.4.13
RealtimeTTS v0.4.13 Release Notes
🚀 New Features
EdgeEngine
- Introducing EdgeEngine, a free, extremely lightweight, and beginner-friendly engine.
- Designed for simplicity with no complex dependencies, making it ideal for lightweight projects or newcomers to TTS.
🛠 Bug Fixes
v0.4.11
v0.4.10
- new stream2sentence version 0.2.7
- bugfix for #5 (causing a whitespace between words to get lost sometimes)
- upgrade to latest NLTK and Stanza versions including new "punkt-tab" model
- allow offline environment for stanza
- adds support for async streams (preparations for async in RealtimeTTS)
- dependency upgrades to latest version (coqui tts 0.24.2 ➡️ 0.24.3, elevenlabs 1.11.0 ➡️ 1.12.1, openai 1.52.2 ➡️ 1.54.3)
- added load_balancing parameter to coqui engine
- if you have a fast machine with a realtime factor way lower than 1, we infer way faster then we need to
- this parameter allows you to infer with a rt factor closer to 1, so you will still have streaming voice inference BUT your GPU load goes down to the minimum that is needed to produce chunks in realtime
- if you do LLM inference in parallel this will be faster now because TTS takes less load
v0.4.9
v0.4.8
-
added ParlerEngine. Needs flash attention, then barely runs fast enough for realtime inference on a 4090.
Parler Installation for Windows (after installing RealtimeTTS):
pip install git+https://github.com/huggingface/parler-tts.git pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121 pip install https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl pip install "numpy<2"