Skip to content

API ‐ Standard TTS Generation API

erew123 edited this page Oct 2, 2024 · 4 revisions

This endpoint allows you to generate Text-to-Speech (TTS) audio based on text input. It supports both character and narrator speech generation.

To understand how tts requests to this endpoint flow through AllTalk V2, please see the flowchart here

Endpoint Details

  • URL: http://{ipaddress}:{port}/api/tts-generate
  • Method: POST
  • Content-Type: application/x-www-form-urlencoded

Request Parameters

Parameter Type Description
text_input string The text you want the TTS engine to produce.
text_filtering string Filter for text. Options: none, standard, html
character_voice_gen string The name of the character's voice file (WAV format).
rvccharacter_voice_gen string The name of the RVC voice file for the character. Format: folder\file.pth or Disabled
rvccharacter_pitch integer The pitch for the RVC voice for the character. Range: -24 to 24
narrator_enabled boolean Enable or disable the narrator function.
narrator_voice_gen string The name of the narrator's voice file (WAV format).
rvcnarrator_voice_gen string The name of the RVC voice file for the narrator. Format: folder\file.pth or Disabled
rvcnarrator_pitch integer The pitch for the RVC voice for the narrator. Range: -24 to 24
text_not_inside string Specify handling of lines not inside quotes or asterisks. Options: character, narrator, silent
language string Choose the language for TTS. (See supported languages below)
output_file_name string The name of the output file (excluding the .wav extension).
output_file_timestamp boolean Add a timestamp to the output file name.
autoplay boolean Enable or disable playing the generated TTS to your standard sound output device.
autoplay_volume float Set the autoplay volume. Range: 0.1 to 1.0
speed float Set the speed of the generated audio. Range: 0.25 to 2.0
pitch integer Set the pitch of the generated audio. Range: -10 to 10
temperature float Set the temperature for the TTS engine. Range: 0.1 to 1.0
repetition_penalty float Set the repetition penalty for the TTS engine. Range: 1.0 to 20.0

Supported Languages

Code Language
ar Arabic
zh-cn Chinese (Simplified)
cs Czech
nl Dutch
en English
fr French
de German
hi Hindi (limited support)
hu Hungarian
it Italian
ja Japanese
ko Korean
pl Polish
pt Portuguese
ru Russian
es Spanish
tr Turkish

Example Requests

Standard TTS Speech Example

Generate a time-stamped file for standard text and play the audio at the command prompt/terminal:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesnt matter in the slightest" \
     -d "text_filtering=standard" \
     -d "character_voice_gen=female_01.wav" \
     -d "narrator_enabled=false" \
     -d "narrator_voice_gen=male_01.wav" \
     -d "text_not_inside=character" \
     -d "language=en" \
     -d "output_file_name=myoutputfile" \
     -d "output_file_timestamp=true" \
     -d "autoplay=false" \
     -d "autoplay_volume=0.8"

Narrator Example

Generate a time-stamped file for text with narrator and character speech and play the audio at the command prompt/terminal:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=*This is text spoken by the narrator* \"This is text spoken by the character\". This is text not inside quotes." \
     -d "text_filtering=standard" \
     -d "character_voice_gen=female_01.wav" \
     -d "narrator_enabled=true" \
     -d "narrator_voice_gen=male_01.wav" \
     -d "text_not_inside=character" \
     -d "language=en" \
     -d "output_file_name=myoutputfile" \
     -d "output_file_timestamp=true" \
     -d "autoplay=false" \
     -d "autoplay_volume=0.8"

Note: If your text contains double quotes, escape them with \" (see the narrator example).

Minimal Request Example

You can send a request with any mix of settings you wish. Missing fields will be populated using default API Global settings and default TTS engine settings:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesnt matter in the slightest"

Response

The API returns a JSON object with the following properties:

Property Description
status Indicates whether the generation was successful (generate-success) or failed (generate-failure).
output_file_path The on-disk location of the generated WAV file.
output_file_url The HTTP location for accessing the generated WAV file for browser playback.
output_cache_url The HTTP location for accessing the generated WAV file as a pushed download.

Example response:

{
    "status": "generate-success",
    "output_file_path": "C:\\text-generation-webui\\extensions\\alltalk_tts\\outputs\\myoutputfile_1704141936.wav",
    "output_file_url": "/audio/myoutputfile_1704141936.wav",
    "output_cache_url": "/audiocache/myoutputfile_1704141936.wav"
}

Note: The response no longer includes the IP address and port number. You will need to add these in your own software/extension.

Additional Notes

  • All global settings for the API endpoint can be configured within the AllTalk interface under Global Settings > AllTalk API Defaults.
  • TTS engine-specific settings, such as voices to use or engine parameters, can be set on an engine-by-engine basis in TTS Engine Settings > TTS Engine of your choice.
  • Although you can send all variables/settings, the loaded TTS engine will only support them if it is capable. For example, you can request a TTS generation in Russian, but if the TTS model that is loaded only supports English, it will only generate English-sounding text-to-speech.
  • Voices sent in the request have to match the voices available within the TTS engine loaded. Generation requests where the voices don't match will result in nothing being generated and possibly an error message.

Code Examples

Python Example

import requests
import json

# API endpoint
API_URL = "http://127.0.0.1:7851/api/tts-generate"

# Function to generate TTS
def generate_tts(text, character_voice, narrator_voice=None, language="en", output_file="output", autoplay=False):
    # Prepare the payload
    payload = {
        "text_input": text,
        "text_filtering": "standard",
        "character_voice_gen": character_voice,
        "narrator_enabled": "true" if narrator_voice else "false",
        "narrator_voice_gen": narrator_voice if narrator_voice else "",
        "text_not_inside": "character",
        "language": language,
        "output_file_name": output_file,
        "output_file_timestamp": "true",
        "autoplay": str(autoplay).lower(),
        "autoplay_volume": "0.8"
    }

    # Send POST request to the API
    response = requests.post(API_URL, data=payload)

    # Check if the request was successful
    if response.status_code == 200:
        result = json.loads(response.text)
        if result["status"] == "generate-success":
            print(f"TTS generated successfully!")
            print(f"File path: {result['output_file_path']}")
            print(f"File URL: {result['output_file_url']}")
            print(f"Cache URL: {result['output_cache_url']}")
        else:
            print("TTS generation failed.")
    else:
        print(f"Error: {response.status_code} - {response.text}")

# Example usage
if __name__ == "__main__":
    text = "Hello, this is a test of the TTS API. *This part is narrated.* \"And this is spoken by a character.\""
    character_voice = "female_01.wav"
    narrator_voice = "male_01.wav"

    generate_tts(text, character_voice, narrator_voice)

# Note: Make sure to replace the API_URL with the correct IP address and port if different from the default
# You can customize the payload further by adding more parameters as needed (e.g., pitch, speed, temperature)
# Error handling can be improved for production use

Javascript Example

// API endpoint
const API_URL = "http://127.0.0.1:7851/api/tts-generate";

// Function to generate TTS
async function generateTTS(text, characterVoice, narratorVoice = null, language = "en", outputFile = "output", autoplay = false) {
    // Prepare the payload
    const payload = new URLSearchParams({
        text_input: text,
        text_filtering: "standard",
        character_voice_gen: characterVoice,
        narrator_enabled: narratorVoice ? "true" : "false",
        narrator_voice_gen: narratorVoice || "",
        text_not_inside: "character",
        language: language,
        output_file_name: outputFile,
        output_file_timestamp: "true",
        autoplay: autoplay.toString(),
        autoplay_volume: "0.8"
    });

    try {
        // Send POST request to the API
        const response = await fetch(API_URL, {
            method: 'POST',
            body: payload,
            headers: {
                'Content-Type': 'application/x-www-form-urlencoded',
            },
        });

        if (!response.ok) {
            throw new Error(`HTTP error! status: ${response.status}`);
        }

        const result = await response.json();

        if (result.status === "generate-success") {
            console.log("TTS generated successfully!");
            console.log(`File path: ${result.output_file_path}`);
            console.log(`File URL: ${result.output_file_url}`);
            console.log(`Cache URL: ${result.output_cache_url}`);
            return result;
        } else {
            console.error("TTS generation failed.");
            return null;
        }
    } catch (error) {
        console.error("Error:", error);
        return null;
    }
}

// Example usage
const text = "Hello, this is a test of the TTS API. *This part is narrated.* \"And this is spoken by a character.\"";
const characterVoice = "female_01.wav";
const narratorVoice = "male_01.wav";

generateTTS(text, characterVoice, narratorVoice)
    .then(result => {
        if (result) {
            // Handle successful generation, e.g., play audio or update UI
        }
    });

// Note: Make sure to replace the API_URL with the correct IP address and port if different from the default
// You can customize the payload further by adding more parameters as needed (e.g., pitch, speed, temperature)
// This example uses async/await for better readability, but you can also use .then() chains if preferred
// Error handling can be improved for production use
// For browser usage, ensure CORS is properly configured on the server side
Clone this wiki locally