Skip to content

Global Settings API Defaults Explanation

erew123 edited this page Oct 3, 2024 · 1 revision

This document explains the purpose and functionality of the API Defaults in AllTalk TTS.

Table of Contents

  1. Interaction Between API Defaults and TTS Engine Settings
  2. API Version Settings
  3. API Default Settings
  4. API Allowed Text Filtering/Passthrough Settings

Interaction Between API Defaults and TTS Engine Settings

How Minimal Requests Work

When you send a minimal TTS request to the AllTalk API, not all parameters need to be specified in the TTS generation request. When settings are not specified in the TTS request, AllTalk will automatically populate missing settings using a combination of:

  1. API Default Settings
  2. TTS Engine Default Settings

This allows for very concise API calls while still maintaining full control over the TTS generation process.

You can read more information about Standard TTS Generation API here and there is a flowchart showing this process TTS Request Flowchart here.

API Default Settings vs. TTS Engine Settings

  • API Default Settings: These are general settings that apply to all requests, regardless of the TTS engine being used. They are set in the Gradio interface under "Global Settings" > "AllTalk API Defaults".

  • TTS Engine Settings: These are specific to the currently loaded TTS engine and can be found in the Gradio interface under "TTS Engine Settings" > [Your Chosen TTS Engine] > "Default Settings".

Examples of TTS Engine-Specific Settings

The TTS Engine Settings include, but are not limited to:

  • DeepSpeed (on/off)
  • Low VRAM mode (on/off)
  • Temperature
  • Repetition penalty
  • Pitch
  • Speed
  • OpenAI voice mappings
  • Default Character voice
  • Default Narrator voice

How Settings are Populated

When you send a minimal TTS request, AllTalk follows this process to determine the final settings for TTS generation:

  1. Start with the API Default Settings as a baseline.
  2. Apply any relevant TTS Engine-specific default settings.
  3. Finally, apply any settings explicitly provided in the API request.

This hierarchical approach ensures that:

  • You always have a consistent baseline (API Defaults).
  • TTS engine-specific optimizations are applied automatically.
  • You maintain the flexibility to override any setting on a per-request basis.

Example Scenario

Let's say you send a minimal request with just the text to be spoken:

{
    "text_input": "Hello, world!"
}

AllTalk will:

  1. Use the API Default Settings for language, text filtering, narrator settings, etc.
  2. Apply the TTS Engine-specific settings for the voice, speed, pitch, etc.
  3. Generate the audio using these combined settings.

If you want to override any of these settings, you can include them in your request:

{
    "text_input": "Hello, world!",
    "language": "fr",
    "speed": 1.5
}

This request will use French as the language and increase the speed, while still using other default settings from both the API and the TTS Engine.

API Version Settings

AllTalk API Version

These settings provide default values for API requests and ensure compatibility between AllTalk v1 and v2. Typically you would leave this turned off, but they can be turned on for compatibility if you need to need to use AllTalk V2 with some code you wrote for AllTalk V1, however, this will affect anything that is designed to work with AllTalk V2.

  • Purpose: Determines which API version to use for responses.
  • Options:
    • AllTalk v2 API (Default)
    • AllTalk v1 API (Legacy)
  • Explanation: AllTalk v2 made changes to the API response format. This setting allows users to maintain compatibility with code written for v1 if needed.

AllTalk v1 API IP Address

  • Purpose: Specifies the IP address included in responses when using the legacy API.
  • Default: 127.0.0.1
  • Explanation: AllTalk v1 bound to 127.0.0.1, which caused issues for network access. v2 binds to 0.0.0.0 and removes the IP from responses. This setting allows v1-style responses for easier migration.

Response Format Comparison

V1 Response (with IP address):

{
    "status": "generate-success",
    "output_file_path": "C:\\text-generation-webui\\extensions\\alltalk_tts\\outputs\\myoutputfile_1704141936.wav",
    "output_file_url": "http://127.0.0.1:7851/audio/myoutputfile_1704141936.wav",
    "output_cache_url": "http://127.0.0.1:7851/audiocache/myoutputfile_1704141936.wav"
}

V2 Response (without IP address):

{
    "status": "generate-success",
    "output_file_path": "C:\\text-generation-webui\\extensions\\alltalk_tts\\outputs\\myoutputfile_1704141936.wav",
    "output_file_url": "/audio/myoutputfile_1704141936.wav",
    "output_cache_url": "/audiocache/myoutputfile_1704141936.wav"
}

API Default Settings

These settings provide default values for API requests when specific parameters are not included. They work in conjunction with the Default Settings of the currently loaded TTS engine to populate minimal requests.

Strip sentences shorter than

  • Purpose: Defines the minimum length of a sentence (in characters) that will be processed for text-to-speech.
  • Explanation: Sentences shorter than this value will be filtered out by the Narrator to remove unwanted text characters.
  • Default: 3 characters

Maximum amount of characters

  • Purpose: Sets the maximum number of characters allowed in a single text-to-speech generation request.
  • Explanation: Requests exceeding this limit will be rejected.
  • Default: 2000 characters

Text filtering

  • Purpose: Determines the text filtering method applied to the input text before processing.
  • Options:
    • none (no filtering)
    • standard (basic filtering)
    • html (HTML-specific filtering)
  • Default: Standard

Language

  • Purpose: Sets the default language for text-to-speech if no language is explicitly provided in the request.
  • Default: en (English)

Narrator Enabled/Disabled/Silent

  • Purpose: Determines whether the narrator functionality is enabled by default when not explicitly specified in the request.
  • Options:
    • Enabled
    • Disabled
    • Enabled (Silent)
  • Default: Disabled
  • Note: If set to Enabled or Enabled (silent), all text will go through the narrator function unless disabled is sent as part of the TTS generation request, possibly resulting in silenced TTS.

Narrator Text-not-inside

  • Purpose: Defines how narrated text is split and processed when not explicitly specified in the request.
  • Options:
    • character (text is associated with the character)
    • narrator (text is associated with the narrator)
    • silent (text is not spoken)
  • Default: narrator

Output file name

  • Purpose: Specifies the default name for the output file when no filename is provided in the request.
  • Default: myoutputfile

Include Timestamp

  • Purpose: Determines whether a unique identifier (UUID) timestamp is appended to the generated text-to-speech output file.
  • Options:
    • Timestamp files
    • Don't Timestamp (Over-write)
  • Default: Timestamp files
  • Explanation: When enabled, each output file will have a unique timestamp, preventing overwriting of files. When disabled, files with the same name will be overwritten.

Play Locally or Remotely

  • Purpose: Specifies whether the generated audio should be played locally on the client-side or remotely on the server-side console/terminal.
  • Options:
    • Play locally
    • Play remotely
  • Default: Play locally

Remote play volume

  • Purpose: Adjusts the volume level for audio playback when the 'Play Remotely' option is selected.
  • Range: 0.1 (lowest) to 0.9 (highest)
  • Default: 0.9

API Allowed Text Filtering/Passthrough Settings

  • Purpose: Defines the set of characters and Unicode ranges that are permitted to be processed by the AllTalk TTS system.
  • Explanation: This filter ensures that only valid and supported characters are passed to the TTS engine or AI model for generation. It helps prevent unwanted sounds or issues that certain characters might cause in the loaded TTS engine.

The filter includes:

  • ASCII letters and digits
  • Punctuation characters
  • Single and double quotes
  • Whitespace characters
  • Hyphens/dashes
  • Dollar signs
  • Various Unicode ranges for different languages and scripts (e.g., Latin characters with diacritics, Cyrillic, Devanagari, Chinese, Arabic, Japanese, Korean, Hungarian)
  • Special quotation marks and punctuation

For a full list of allowed characters and Unicode ranges, please refer to the AllTalk API Defaults settings in the Gradio interface.

Clone this wiki locally