Skip to content

FAQ, Quirks & General Questions

erew123 edited this page Nov 16, 2024 · 9 revisions

This page addresses common questions, quirks, and general information about AllTalk V2.

Table of Contents

  1. Configuration Files
  2. TTS Engines and Models
  3. Language Support
  4. Interface Navigation
  5. Additional Information

Configuration Files and Environment

Q: Which are the main configuration files changed by AllTalk TTS?

A: The two main configuration files that are changed are:

  1. \alltalk_tts\confignew.json: This file stores almost all the configuration settings.
  2. \alltalk_tts\system\tts_engines\tts_engines.json: This file stores the currently loaded TTS engine & model, as well as a list of available TTS engines and models.

Q: Where is the Python environment for AllTalk stored?

A: The Python environment for AllTalk is built in the alltalk_environment folder. This folder contains all the necessary Python packages and dependencies for running AllTalk.

Q: What are the "start_xxxx" files?

A: The installation process creates several "start_xxxx" batch (for Windows) or shell (for Unix-based systems) files:

  • start_alltalk: Starts the main AllTalk application
  • start_environment: Activates the AllTalk Python environment
  • start_finetune: Starts the finetuning process
  • start_diagnostics: Generates a diagnostics file

These files are created to make it easier to run different aspects of AllTalk without having to manually activate the Python environment each time.

Q: Can I transfer the alltalk_environment folder or "start_xxxx" files between different installations?

A: No, it's not recommended to transfer the alltalk_environment folder or the "start_xxxx" files between different installations or disk locations. Doing so can cause issues because these files and folders contain absolute paths specific to their original installation location. If you need to set up AllTalk in a new location, it's best to perform a fresh installation.

Q: How can I completely rebuild the Python environment from scratch?

A: To rebuild the Python environment from scratch:

  1. Delete the alltalk_environment folder entirely.
  2. Delete all the "start_xxxx" files.
  3. Run the installation process again (typically by running atsetup.bat or the equivalent for your system).

This will create a fresh Python environment and new startup scripts tailored to your current installation location.

TTS Engines and Models

Q: What are the base features of each TTS engine?

Model DeepSpeed Pitch Speed RepPen MultiLang Streaming Low VRAM Temp Multi Model Notes
F5-TTS No No Yes No *Yes No Yes No Yes *
Parler-TTS No No No No No No Yes No Yes **
Piper No No Yes No *No No No No Yes ***
Coqui VITS No No No No *No No Yes No Yes ***
Coqui XTTS Yes No Yes Yes Yes Yes Yes Yes Yes ****

Notes

  • F5-TTS: Supports only Chinese and English voice cloning.
  • Parler-TTS: Likely English TTS generation only.
  • Piper and Coqui VITS: Language support depends on the model file loaded.
  • Coqui XTTS: Multi-language and voice cloning capability.

Q: How do I change/set the TTS Engine to XTTS, Piper, VITS, etc.?

A: You can change the TTS Engine in the Gradio interface:

  1. Go to the "Generate TTS" > "Generate" tab.
  2. Look for the "Swap TTS Engine" option to change the engine.
  3. Use the "Load Different Model" option to change the model for the selected engine.

Note: There is also a "Generate Help" tab that provides detailed explanations for this portion of the interface.

Q: How can I find more information about each TTS Engine?

A: The Gradio interface provides detailed information for each TTS Engine:

  1. Go to the "TTS Engine Settings" tab.
  2. Select the engine you're interested in.
  3. For each engine, you'll find:
    • Available settings you can configure
    • An "Engine Information" tab with details about the engine, including the developer's website
    • A "Models Download" area where you can download models for that TTS Engine
    • An "Engine Help" tab with specific information about the engine, including:
      • Where its models are stored
      • How to create voices (if available)
      • Any other relevant information or quirks

Q: What are the basic differences between the TTS Engines?

Engine Type Voice Cloning Resource Usage Generation Speed Key Features
Coqui XTTS Neural TTS (VITS-based) Yes High Medium-Fast - Zero-shot voice cloning with just 3s audio
- Supports 17 languages
- Streaming capable (<200ms latency)
- Cross-language voice cloning
Piper Neural TTS (VITS-based) No* Low Fast - Optimized for Raspberry Pi
- ONNX runtime for efficiency
- Wide language support (30+ languages)
- Local/offline use
- Streaming capable
Parler Neural TTS No Medium Medium - Natural language controlled voice style
- 34 built-in voices
- High quality audiobook-style speech
- Strong prosody control via punctuation
VITS Neural TTS No* Medium-High Fast - End-to-end architecture
- Multi-speaker capabilities (with training)
- Can use external speaker embeddings
- HiFiGAN vocoder based
F5-TTS Neural TTS (Flow Matching) Yes Medium Medium-Fast - Flow matching technique
- Multi-style/Multi-speaker
- Chunk inference support
- Voice chat capabilities

Note: VITS & Piper can support multiple speakers but requires full training/fine-tuning rather than zero-shot voice cloning. Please see the developers websites for details on doing this.

Language Support

Q: Can AllTalk TTS support Hindi?

A: Yes, but with some limitations:

  • The Coqui XTTS engine can process Hindi, but only with the XTTS 2.0.3 model loaded as apitts.
  • Idiap (which maintains the Coqui TTS engine) is working on updating the tokenizer to improve Hindi support. However, this update is not yet available.
  • You can track the progress of this update in this GitHub commit.

Additional Information

Q: Where can I find help on using the Generate TTS interface?

A: In the Gradio interface, navigate to the "Generate TTS" > "Generate" tab. There is a "Generate Help" tab that provides comprehensive explanations for all the options and features available in this section of the interface.

Q: Are there any known quirks or limitations I should be aware of?

A: Yes, here are a few:

  • Some languages may have limited support depending on the TTS engine and model you're using.
  • Certain features or settings may only be available with specific engines or models.
  • Always check the "Engine Help" tab for any engine-specific quirks or limitations.
Clone this wiki locally