Skip to content

Releases: KoljaB/RealtimeTTS

v0.4.21

14 Dec 17:44
Compare
Choose a tag to compare

RealtimeTTS v0.4.21 Release Notes

🚀 New Features

  • update to latest versions of dependencies (stream2sentence, coqui-tts, elevenlabs, openai, edge-tts)

StyleTTS Engine

  • Added seed. Added fix for a styletts2 problem causing noise to be generated with very short texts, especially when using embedding_scale values > 1

🛠 Bug Fixes

  • Fixed a problem in stream2sentence causing minimum_sentence_length to not be respected

v0.4.20 🌿

10 Dec 22:05
Compare
Choose a tag to compare

RealtimeTTS v0.4.20 Release Notes

🚀 New Features

Azure Engine

  • Added support for 48 kHz audio output in the Azure TTS engine for improved audio quality (and providing more flexibility in audio formats).

StyleTTS Engine

  • introduced StyleTTSVoice for dynamic voice switching to allow transitions between multiple voice models

🛠 Bug Fixes

  • Fixed incorrect voice initialization when switching between models in the StyleTTS engine.
  • Fixed model configuration path issues during runtime when updating voice parameters.

v0.4.19

07 Dec 07:47
905f1fb
Compare
Choose a tag to compare
  • Added support for the StyleTTS2 engine.
  • Updated Coqui-TTS to version 0.25.0, which includes a fix for issue #227
  • Upgraded all dependent libraries to their latest versions

v0.4.17

30 Nov 21:42
Compare
Choose a tag to compare
  • performance improvements, bugfixes and better edge_test.py for edge tts
EdgeTTSDemo.mp4

v0.4.14

29 Nov 21:47
Compare
Choose a tag to compare

fixes #223

Enhancements to Sentence Processing

  • Improved buffer handling by ensuring it starts with an alphanumeric character to prevent TTS confusion caused by initial non-phonetic characters.
  • Bug Fix: Resolved an issue where the word counter wasn’t reset after triggering force_first_fragment_after_words, causing processing errors.
  • Increased the default force_first_fragment_after_words threshold from 15 to 30 for better fragment control.

v0.4.13

28 Nov 14:28
Compare
Choose a tag to compare

RealtimeTTS v0.4.13 Release Notes

🚀 New Features

EdgeEngine

  • Introducing EdgeEngine, a free, extremely lightweight, and beginner-friendly engine.
  • Designed for simplicity with no complex dependencies, making it ideal for lightweight projects or newcomers to TTS.

🛠 Bug Fixes

  • Resolved ValueError: ('Sample format not supported', -9994) (#221).
  • Fixed RecursionError: maximum recursion depth exceeded (#222).
  • Addressed the requirement to manually install resampy after installing RealtimeTTS.

v0.4.11

16 Nov 22:22
Compare
Choose a tag to compare
  • optimizations for linux
    • setting multiprocessing spawn start method fix for linux now
    • if tts engine output sample rate is not supported by the sound card the chunks get resampled now
    • mechanism to prevent potential stream buffer overflows added

v0.4.10

07 Nov 14:17
Compare
Choose a tag to compare
  • new stream2sentence version 0.2.7
    • bugfix for #5 (causing a whitespace between words to get lost sometimes)
    • upgrade to latest NLTK and Stanza versions including new "punkt-tab" model
    • allow offline environment for stanza
    • adds support for async streams (preparations for async in RealtimeTTS)
  • dependency upgrades to latest version (coqui tts 0.24.2 ➡️ 0.24.3, elevenlabs 1.11.0 ➡️ 1.12.1, openai 1.52.2 ➡️ 1.54.3)
  • added load_balancing parameter to coqui engine
    • if you have a fast machine with a realtime factor way lower than 1, we infer way faster then we need to
    • this parameter allows you to infer with a rt factor closer to 1, so you will still have streaming voice inference BUT your GPU load goes down to the minimum that is needed to produce chunks in realtime
    • if you do LLM inference in parallel this will be faster now because TTS takes less load

v0.4.9

01 Nov 16:17
Compare
Choose a tag to compare
  • added print_realtime_factor to CoquiEngine
  • removed a debug message that somehow made it to pypi

v0.4.8

29 Oct 17:00
Compare
Choose a tag to compare
  • added ParlerEngine. Needs flash attention, then barely runs fast enough for realtime inference on a 4090.

    Parler Installation for Windows (after installing RealtimeTTS):

    pip install git+https://github.com/huggingface/parler-tts.git
    pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
    pip install https://github.com/oobabooga/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu122torch2.3.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
    pip install "numpy<2"