AllTalk is an updated version of the Coqui_tts extension for Text Generation web UI. Features include:
- Custom Start-up Settings: Adjust your default start-up settings. Screenshot
- Narrarator: Use different voices for main character and narration. Example Narration
- Low VRAM mode: Great for people with small GPU memory or if your VRAM is filled by your LLM. Screenshot
- DeepSpeed: A 3-4x performance boost generating TTS. DeepSpeed Windows/Linux Instructions Screenshot
- Local/Custom models: Use any of the XTTSv2 models (API Local and XTTSv2 Local).
- Optional wav file maintenance: Configurable deletion of old output wav files. Screenshot
- Documentation: Fully documented with a built in webpage. Screenshot
- Console output Clear command line output for any warnings or issues.
- Standalone/3rd Party support via JSON calls Can be used with 3rd party applications via JSON calls.
- 🟩 Installation
- 🟪 Updating & problems with updating
- 🟫 Screenshots
- 🟨 Help with problems
- 🔵🟢🟡 DeepSpeed Installation (Windows & Linux)
- 🔴 Future to-do list & Upcoming updates
The latest build (13 Dec 2023) has had the entire text filtering engine and narration engine rebuilt from scratch. It's highly complicated how its actually working, but the end result it a much clearer TTS output and much better control over the narrator option and how to handle text that isnt within quotes or asterisks. It does however mean you need to ensure your character card is set up correctly if using the narrator function. Details are below in the installation notes.
DeepSpeed v11.2 can be installed within the default text-generation-webui Python 3.11 environment. Installs in custom Python environments are possible, but can be more complicated. Instructions here (or scroll down).
This has been tested on the current Dec 2023 release of Text generation webUI. If you have not updated it for a while, you may wish to update Text generation webUI, instructions here
-
In a command prompt/terminal window you need to move into your Text generation webUI folder:
cd text-generation-webui
-
Start the Text generation webUI Python environment for your OS:
cmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
-
Move into your extensions folder:
cd extensions
-
Once there git clone this repository:
git clone https://github.com/erew123/alltalk_tts
-
Move into the alltalk_tts folder:
cd alltalk_tts
-
Install the requirements:
Nvidia graphics card machines -pip install -r requirements_nvidia.txt
Other machines (mac, amd etc) -pip install -r requirements_other.txt
-
(Optional DeepSpeed) If you have an Nvidia Graphics card on a system running Linux or Windows and wish to use DeepSpeed please follow these instructions here. However, I would highly reccommend before you install DeepSpeed, you start text-generation-webui up, confirm AllTalk starts correctly and everything is working, as DeepSpeed can add another layer of complications troubleshooting any potential start-up issues. If necessary you can
pip uninstall deepspeed
. -
You can now start move back to the main Text generation webUI folder
cd ..
(a few times), start Text generation webUI (start_windows.bat
,./start_linux.sh
,start_macos.sh
orstart_wsl.bat
) and load the AllTalk extension in the Text generation webUI session tab. -
Please read the note below about start-up times and also the note about ensuring your character cards are set up correctly
-
Some extra voices downloadable here
On first startup, AllTalk will download the Coqui XTTSv2 2.0.2 model to its models folder (1.8GB space required). Check the command prompt/terminal window if you want to know what its doing. After it says "Model Loaded" the Text generation webUI is usually available on its IP address a few seconds later, for you to connect to in your browser.
Once the extension is loaded, please find all documentation and settings on the link provided in the interface (as shown in the screenshot below).
Where to find voices https://aiartes.com/voiceai or https://commons.wikimedia.org/ or interviews on youtube etc. Instructions on how to cut down and prepare a voice sample are within the built in documentation.
Narrator function specific - With a RP chat with your AI, on your character card parameters menu
> character tab
> greeting
make sure that anything in there that is the narrator is in asterisks and anything spoken is in double quotes, then hit the save
(💾) button. Greeting paragraphs/sentences are handled differently from how the AI sends text and so its difficut to account for them both.
I could force a delimeter in at this stage, but I know it would/may affect things further down the line in the chat and I need a good think about that before just making a change. This issue only affects the greeting card/start of conversation and the "example" card that comes with text-generation-webui suffers this issue (if you want to try it for yourself). So you would put double quotes around like this (from the example card):
"
Hey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from."
This is pretty much a repeat of the installation process.
-
In a command prompt/terminal window you need to move into your Text generation webUI folder:
cd text-generation-webui
-
Move into your extensions and alltalk_tts folder:
cd extensions
thencd alltalk_tts
-
At the command prompt/terminal, type:
git pull
-
Install the requirements:
Nvidia graphics card machines -pip install -r requirements_nvidia.txt
Other machines (mac, amd etc) -pip install -r requirements_other.txt
Click to expand
I did leave a mistake in the /extensions/alltalk_tts/.gitignore
file at one point. If your git pull
doesnt work, you can either follow the Problems Updating section below, or edit the .gitignore
file and replace its entire contents with the below, save the file, then re-try the git pull
voices/*.*
models/*.*
outputs/*.*
config.json
confignew.json
models.json
diagnostics.log
Click to expand
If you do experience any problems, the simplest method to resolve this will be:
-
re-name the existing
alltalk_tts
folder to something likealltalk_tts.old
-
Start a console/terminal then:
cd text-generation-webui
and start your python environmentcmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
-
Move into the extensions folder, same as if you were doing a fresh installation:
cd extensions
then
git clone https://github.com/erew123/alltalk_tts
This will download a fresh installation.
-
Move into the alltalk_tts folder:
cd alltalk_tts
-
Install the requirements:
Nvidia graphics card machines -pip install -r requirements_nvidia.txt
Other machines (mac, amd etc) -pip install -r requirements_other.txt
-
Before starting it up, copy/merge the
models
,voices
andoutputs
folders over from thealltalk_tts.old
folder to the newly createdalltalk_tts
folder. This will keep your voices history and also stop it re-downloading the model again.
You can now start text-generation-webui or AllTalk (standalone) and it should start up fine. You will need to re-set any saved configuration changes on the configuration page.
Assuming its all working fine and you are happy, you can delete the old alltalk_tts.old folder.
Click to expand
-
Open a command prompt window, move into your text-generation-webui folder, you can now start the Python environment for text-generation-webui:
cmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
-
Move into the alltalk_tts folder:
cd extensions
and thencd alltalk_tts
-
Run the diagnostics and select the requirements file name you installed AllTalk with:
python diagnostics.py
-
You will have an on screen output showing your environment setttings, file versions request vs whats installed and details of your graphics card (if Nvidia). This will also create a file called
diagnostics.log
in thealltalk_tts
folder, that you can upload if you need to create a support ticket on here.
🟨 [AllTalk Startup] Warning TTS Subprocess has NOT started up yet, Will keep trying for 60 seconds maximum. Please wait. It times out after 60 seconds.
Click to expand
When the subprocess is starting 2x things are occurring:
A) Its trying to load the voice model into your graphics card VRAM (assuming you have a Nvidia Graphics card, otherwise its your system RAM)
B) Its trying to start up the mini-webserver and send the "ready" signal back to the main process.
Note: If you need to create a support ticket, please create a diagnostics.log
report file to submit with a support request. Details on doing this are above.
Possibilities for this issue are:
-
You are starting AllTalk in both your
CMD FLAG.txt
andsettings.yaml
file. TheCMD FLAG.txt
you would have manually edited and thesettings.yaml
is the one you change and save in thesession
tab of text-generation-webui and you canSave UI defaults to settings.yaml
. Please only have one of those two starting up AllTalk. -
You are not starting text-generation-webui with its normal Python environment. Please start it with start_{your OS version} as detailed here (
start_windows.bat
,./start_linux.sh
,start_macos.sh
orstart_wsl.bat
) OR (cmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
and thenpython server.py
). -
You have installed the wrong version of DeepSpeed on your system, for the wrong version of Python/Text-generation-webui. You can go to your text-generation-webui folder in a terminal/command prompt and run the correct cmd version for your OS e.g. (
cmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
) and then you can typepip uninstall deepspeed
then try loading it again. If that works, please see here for the correct instructions for installing DeepSpeed here. -
You have an old version of text-generation-webui (pre Dec 2023) I have not tested on older versions of text-generation-webui, so cannot confirm viability on older versions. For instructions on updating the text-generation-webui, please look here (
update_linux.sh
,update_windows.bat
,update_macos.sh
, orupdate_wsl.bat
). -
You already have something running on port 7851 on your computer, so the mini-webserver cant start on that port. You can change this port number by editing the
confignew.json
file and changing"port_number": "7851"
to"port_number": "7602"
or any port number you wish that isn’t reserved. Only change the number and save the file, do not change the formatting of the document. This will at least discount that you have something else clashing on the same port number. -
You have antivirus/firewalling that is blocking that port from being accessed. If you had to do something to allow text-generation-webui through your antivirus/firewall, you will have to do that for this too.
-
You have quite old graphics drivers and may need to update them.
-
Something within text-generation-webui is not playing nicely for some reason. You can go to your text-generation-webui folder in a terminal/command prompt and run the correct cmd version for your OS e.g. (
cmd_windows.bat
,./cmd_linux.sh
,cmd_macos.sh
orcmd_wsl.bat
) and then you can typepython extensions\alltalk_tts\script.py
and see if AllTalk starts up correctly. If it does then something else is interfering. -
Something else is already loaded into your VRAM or there is a crashed python process. Either check your task manager for erroneous Python processes or restart your machine and try again.
-
You are running DeepSpeed on a Linux machine and although you are starting with
./start_linux.sh
AllTalk is failing there on starting. This is because text-generation-webui will overwrite some environment variables when it loads its python environment. To see if this is the problem, from a terminal go into your text-generation-webui folder and./cmd_linux.sh
then set your environment variable again e.g.export CUDA_HOME=/usr/local/cuda
(this may vary depending on your OS, but this is the standard one for Linux, and assuming you have installed the CUDA toolkit), thenpython server.py
and see if it starts up. If you want to edit the environment permanently you can do so, I have not managed to write full instructions yet, but here is the conda guide here. -
You have built yourself a custom Python environment and something is funky with it. This is very hard to diagnose as its not a standard environment. You may want to updating text-generation-webui and re installing its requirements file (whichever one you use that comes down with text-generation-webui).
🟨 I activated DeepSpeed in the settings page, but I didnt install DeepSpeed yet and now I have issues starting up
Click to expand
You can either follow the Problems Updating and fresh install your config. Or you can edit the confignew.json
file within the alltalk_tts
folder. You would look for '"deepspeed_activate": true,' and change the word true to false `"deepspeed_activate": false,' ,then save the file and try starting again.
If you want to use DeepSpeed, you need an Nvidia Graphics card and to install DeepSpeed on your system. Instructions are here
Click to expand
Please see Problems Updating. If that doesnt help you can raise an ticket here. It would be handy to have any log files from the console where your error is being shown. I can only losely support custom built Python environments and give general pointers. Please create a diagnostics.log
report file to submit with a support request.
Also, is your text-generation-webui up to date? instructions here
🟨 I am having problems getting AllTalk to start after changing settings or making a custom setup/model setup.
Click to expand
I would suggest following Problems Updating and if you still have issues after that, you can raise an issue here
Click to expand
As far as I am aware, these are to do with the chrome browser the gradio text-generation-webui in some way. I raised an issue about this on the text-generation-webui here where you can see that AllTalk is not loaded and the messages persist. Either way, this is more a warning than an actual issue, so shouldnt affect any functionality of either AllTalk or text-generation-webui, they are more just an annoyance.
Click to expand: Linux DeepSpeed installation
➡️DeepSpeed requires an Nvidia Graphics card!⬅️
- Preferably use your built in package manager to install CUDA tools. Alternatively download and install the Nvidia Cuda Toolkit for Linux Nvidia Cuda Toolkit 11.8 or 12.1
- Open a terminal console.
- Install libaio-dev (however your Linux version installs things) e.g.
sudo apt install libaio-dev
- Move into your Text generation webUI folder e.g.
cd text-generation-webui
- Start the Text generation webUI Python environment
./cmd_linux.sh
- Text generation webUI overwrites the CUDA_HOME environment variable each time you
./cmd_linux.sh
or./start_linux.sh
, so you will need to either permanently change within the python environment OR set CUDA_HOME it each time you./cmd_linux.sh
. Details to change it each time are on the next step. Below is a link to Conda's manual and changing environment variables permanently though its possible changing it permanently could affect other extensions, you would have to test.
Conda manual - Environment variables - You can temporarily set the CUDA_HOME environment with (Standard paths on Ubuntu, but could vary on other Linux flavours):
export CUDA_HOME=/etc/alternatives/cuda
every time you run./cmd_linux.sh
.
If you try to start DeepSpeed with the CUDA_HOME path set incorrectly, expect an error similar to[Errno 2] No such file or directory: /home/yourname/text-generation-webui/installer_files/env/bin/nvcc
- Now install deepspeed with pip install deepspeed
- You can now start Text generation webUI
python server.py
ensuring to activate your extensions.
Just to reiterate, starting Text-generation-webUI with./start_linux.sh
will overwrite the CUDA_HOME variable unless you have permanently changed it, hence always starting it with./cmd_linux.sh
then setting the environment variable manually (step 7) and thenpython server.py
, which is how you would need to run it each time, unless you permanently set the environment variable for CUDA_HOME within Text-generation-webUI's standard Python environment.
Removal - If it became necessary to uninstall DeepSpeed, you can do so with./cmd_linux.sh
and thenpip uninstall deepspeed
DeepSpeed v11.2 will work on the current default text-generation-webui Python 3.11 environment! You have 2x options for how to setup DeepSpeed on Windows. A quick way (🟢Option 1) and a long way (🟡Option 2).
Thanks to @S95Sedan - They managed to get DeepSpeed 11.2 working on Windows via making some edits to the original Microsoft DeepSpeed v11.2 installation. The original post is here.
Click to expand: Pre-Compiled Wheel Deepspeed v11.2 (Python 3.11 and 3.10)
➡️DeepSpeed requires an Nvidia Graphics card!⬅️-
Download the correct wheel version for your Python/Cuda from here and save the file it inside your text-generation-webui folder.
-
Open a command prompt window, move into your text-generation-webui folder, you can now start the Python environment for text-generation-webui:
cmd_windows.bat
-
With the file that you saved in the text-generation-webui folder you now type the following, replacing YOUR-VERSION with the name of the file you have:
pip install "deepspeed-0.11.2+YOUR-VERSION-win_amd64.whl"
-
This should install through cleanly and you should now have DeepSpeed v11.2 installed within the Python 3.11/3.10 environment of text-generation-webui.
-
When you start up text-generation-webui, and AllTalk starts, you should see [AllTalk Startup] DeepSpeed Detected
-
Within AllTalk, you will now have a checkbox for Activate DeepSpeed though remember you can only change 1x setting every 15 or so seconds, so dont try to activate DeepSpeed and LowVRAM/Change your model simultantiously. Do one of those, wait 15-20 seconds until the change is confirmed in the terminal/command prompt, then you can change the other. When you are happy it works, you can set the default start-up settings in the settings page.
Removal - If it became necessary to uninstall DeepSpeed, you can do so withcmd_windows.bat
and thenpip uninstall deepspeed
Click to expand: Manual Build DeepSpeed v11.2 (Python 3.11 and 3.10)
➡️DeepSpeed requires an Nvidia Graphics card!⬅️This will take about 1 hour to complete and about 6GB of disk space.
-
Download the 11.2 release of DeepSpeed extract it to a folder.
-
Install Visual C++ build tools, such as VS2019 C++ x64/x86 build tools.
-
Download and install the Nvidia Cuda Toolkit 11.8 or 12.1
-
OPTIONAL If you do not have an python environment already created and you are not going to use Text-generation-webui's environment, you can install Miniconda, then at a command prompt, create and activate your environment with:
conda create -n pythonenv python=3.11
activate pythonenv
-
Launch the Command Prompt cmd with Administrator privilege as it requires admin to allow creating symlink folders.
-
If you are using the Text-generation-webui python environment, then in the
text-generation-webui
folder you will runcmd_windows.bat
to start the python evnironment.
Otherwise Install PyTorch, 2.1.0 with CUDA 11.8 or 12.1 into your Python 3.1x.x environment e.g:
activate pythonenv
(activate your python environment)
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
or
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
-
Set your CUDA Windows environment variables in the command prompt to ensure that CUDA_HOME and CUDA_PATH are set to your Nvidia Cuda Toolkit path. (The folder above the bin folder that nvcc.exe is installed in). Examples are:
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
or
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8
-
Navigate to wherever you extracted the deepspeed folder in the Command Prompt:
cd c:\DeepSpeed-0.11.2
(wherever you extracted it to) -
Modify the following files:
(These modified files are included in the git-pull of AllTalk, in the DeepSpeed Windows folder and so can just be copied over the top of the exsting folders/files, but if you want to modify them yourself, please follow the below)
deepspeed-0.11.2/build_win.bat - at the top of the file, add:
set DS_BUILD_EVOFORMER_ATTN=0
deepspeed-0.11.2/csrc/quantization/pt_binding.cpp - lines 244-250 - change to:
std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node; // num of GPU per nodes
at::IntArrayRef sz(sz_vector);
auto output = torch::empty(sz, output_options);
const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
const int elems_per_out_group = elems_per_in_tensor / out_groups;
deepspeed-0.11.2/csrc/transformer/inference/csrc/pt_binding.cpp lines 541-542 - change to:
{static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),
lines 550-551 - change to:
{static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),
line 1581 - change to:
at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);
deepspeed-0.11.2/deepspeed/env_report.py line 10 - add:
import psutil
line 83 - 100 - change to:
def get_shm_size():
try:
temp_dir = os.getenv('TEMP') or os.getenv('TMP') or os.path.join(os.path.expanduser('~'), 'tmp')
shm_stats = psutil.disk_usage(temp_dir)
shm_size = shm_stats.total
shm_hbytes = human_readable_size(shm_size)
warn = []
if shm_size < 512 * 1024**2:
warn.append(
f" {YELLOW} [WARNING] Shared memory size might be too small, consider increasing it. {END}"
)
# Add additional warnings specific to your use case if needed.
return shm_hbytes, warn
except Exception as e:
return "UNKNOWN", [f"Error getting shared memory size: {e}"]
-
While still in your command line with python environment enabled run:
build_win.bat
and wait 10-20 minutes. -
Now
cd dist
to go into your dist folder and you can nowpip install deepspeed-YOURFILENAME.whl
(or whatever your WHL file is called).
Removal - If it became necessary to uninstall DeepSpeed, you can do so withcmd_windows.bat
and thenpip uninstall deepspeed
- Complete & document the new/full standalone mode API.
- Voice output within the command prompt/terminal (TBD).
- Correct an issue on incorrect output folder path when running as a standalone app.
- Correct a few spelling mistakes in the documnentation.
- Possibly add some additional TTS engines (TBD).
- Have a break!