Creating the World’s Best Open-Source Speech Model on Bittensor
Check out the beta version of our Front-End!
- Introduction
- Roadmap
- Overview of Miner and Validator Functionality
- Running Miners and Validators
- Contributing
- License
Note: The following documentation assumes you are familiar with basic Bittensor concepts: Miners, Validators, and incentives. If you need a primer, please check out https://docs.bittensor.com/learn/bittensor-building-blocks.
Dippy is one of the world's leading AI companion apps with 1M+ users. The app has ranked #3 on the App Store in countries like Germany, been covered by publications like Wired magazine and the average Dippy user spends 1+ hour on the app.
The Dippy team is also behind Bittensor's Subnet 11, which exists to create the world's best open-source roleplay LLM. Open-source miner models created on Subnet 11 are used to power the Dippy app. We also plan to integrate the models created from this speech subnet within the Dippy app.
The Dippy Empathetic Speech Subnet on Bittensor is dedicated to developing the world’s most advanced open-source Speech model for immersive, lifelike interactions. By leveraging the collaborative strength of the open-source community, this subnet meets the growing demand for genuine companionship through a speech-first approach. Our objective is to create a model that delivers personalized, empathetic speech interactions beyond the capabilities of traditional assistants and closed-source models.
Unlike existing models that depend on reference speech recordings that limit creative flexibility, we use natural language prompting to manage speaker identity and style. This intuitive approach enables more dynamic and personalized roleplay experiences, fostering deeper and more engaging interactions.
Given the complexity of creating a state of the art speech model, we plan to divide the process into 3 distinct phases.
Phase 1:
- Launch a subnet with a robust pipeline for roleplay-specific TTS models, capable of interpreting prompts for speaker identity and stylistic speech description.
- Launch infinitely scaling synthetic speech data pipeline
- Implement a public model leaderboard, ranked on core evaluation metric
- Introduce Human Likeness Score and Word Error Rate as live evaluation criteria for ongoing model assessment.
Phase 2:
- Refine TTS models toward producing more creatively expressive, highly human-like speech outputs.
- Showcase the highest-scoring models and make them accessible to the public through the front-end interface.
Phase 3:
- Advance toward an end-to-end Speech model that seamlessly generates and processes high-quality roleplay audio.
- Establish a comprehensive pipeline for evaluating new Speech model submissions against real-time performance benchmarks.
- Integrate the Speech model within the Dippy app
- Drive the state of the art in Speech roleplay through iterative enhancements and ongoing data collection.
Miners would use existing frameworks to fine tune models to improve upon the current SOTA open-source TTS model. The finetuned weights would be submitted to a shared Hugging Face pool.
Validators would evaluate and assess model performance via our protocol and rank the submissions based on various metrics (e.g. how natural it sounds, emotion matching, clarity etc.). We will provide a suite of testing and benchmarking protocols with state-of-the-art datasets.
- Python 3.8+
- GPU with at least 24 GB of VRAM
To start, clone the repository and cd
into it:
git clone https://github.com/impel-intelligence/dippy-speech-subnet.git
cd dippy-speech-subnet
pip install -e .
As a miner, you're responsible for leveraging all methods available at your disposal to finetune the provided base model.
We outline the following criteria for Phase 1:
- Models should be a fine-tune of the 880M Parler-TTS model.
- Models MUST be Safetensors Format!
- Model: We currently use Parler TTS Mini v1 on Hugging Face as our base model.
Once you're happy with the performance of the model for the usecase of Roleplay, you can simply submit it to Hugging Face 🤗 and then use the following command:
git clone https://github.com/impel-intelligence/dippy-speech-subnet.git
cd dippy-speech-subnet
uv venv .miner
source .miner/bin/activate
uv pip install -r requirements.miner.txt
uv pip install -e .
python neurons/miner.py \
--repo_namespace REPO_NAMESPACE \ # Replace with the namespace of your repository (e.g., parler-tts)
--repo_name REPO_NAME \ # Replace with the name of your repository (e.g., parler-tts-mini-v1)
--config_template CONFIG_TEMPLATE \ # Replace with the miner configuration template (e.g., default)
--netuid NETUID \ # Replace with the unique network identifier (e.g., 231)
--subtensor.network NETWORK \ # Replace with the network (e.g., test or finney)
--online ONLINE \ # Set to True to enable mining
--model_hash MODEL_HASH \ # Replace with the hash of your model
--wallet.name WALLET_NAME \ # Replace with the name of your wallet coldkey name
--wallet.hotkey HOTKEY \ # Replace with your wallet hotkey name
--wallet.path WALLET_PATH \ # Replace with the path to your wallet directory (e.g., "~/.bittensor/wallets/" )
--logging.debug DEBUG # Set to True for debug logging (or False for production)
python neurons/miner.py \
--repo_namespace parler-tts \
--repo_name parler-tts-mini-v1
--config_template default \
--netuid 231 \
--subtensor.network test \
--online True \
--model_hash 555 \
--wallet.name coldkey2 \
--wallet.hotkey hotkey2 \
--wallet.path "~/.bittensor/wallets/" \
--logging.debug True
- Use Python 3.11.5
- UV python package manager
git clone https://github.com/impel-intelligence/dippy-speech-subnet.git
cd dippy-speech-subnet/validator_updater
Request the log token from the moderator or relevant team member via discord channel.
Locate the fluent.conf file in the project structure:
.
├── Dockerfile.fluent
├── Dockerfile.validator
├── build.sh
├── docker-compose.yml
└── fluentd
└── fluent.conf
Update the following line in the fluent.conf file with the provided token:
source_token <ADD LOG TOKEN HERE>
Execute the auto-update validator script with your wallet keys and organization name:
bash build.sh \
--wallet.name WALLET_NAME \ # Replace with your cold wallet name
--wallet.hotkey HOTKEY \ # Replace with your hotkey name
--org.name ORGNAME # Replace with your organization name
bash build.sh --wallet.name Examplekey4 --wallet.hotkey Examplekey4 --org.name Dippy_EXAMPLE
To start, clone the repository and cd
to it:
git clone https://github.com/impel-intelligence/dippy-speech-subnet.git
cd dippy-speech-subnet
uv venv .validator
source .validator/bin/activate
uv pip install -r requirements.validator.txt
uv pip install -e .
To run the evaluation, simply use the following command:
python neurons/validator.py \
--wallet.name WALLET_NAME \ # Replace with the name of your wallet coldkey (e.g., coldkey4)
--wallet.hotkey HOTKEY \ # Replace with your wallet hotkey name (e.g., hotkey4)
--device DEVICE \ # Replace with the device to use (e.g., cpu or cuda)
--netuid NETUID \ # Replace with the unique network identifier (e.g., 231)
--subtensor.network NETWORK \ # Replace with the network name (e.g., test or finney)
--wallet.path WALLET_PATH # Replace with the path to your wallet directory (e.g., "~/.bittensor/wallets/")
python neurons/validator.py \
--wallet.name coldkey4 \
--wallet.hotkey hotkey4 \
--device cuda \
--netuid 231 \
--subtensor.network finney \
--wallet.path "~/.bittensor/wallets/"
Please note that this validator will call the model validation service hosted by the dippy subnet owners. If you wish to run the model validation service locally, please follow the instructions below.
Note: Currently (November 22 2024) this is experimental. We recommend using the remote validation api temporarily.
Starting a validator using your local validator API requires starting validator with --use-local-validation-api
flag.
Additionally, a "model_queue" and "worker" is required to push models to the validation api.
Note: Validator API needs to be installed in a different venv than validator due to pydantic
version conflict.
- Python 3.9+
- Linux
- UV python package manager
Install Git Lfs if not installed.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
If you are running on runpod you might also need to install 'netstat'.
apt-get install net-tools
Build Evaluator Image
cd dippy-speech-subnet
docker build -f evaluator.Dockerfile -t speech .
Build Worker Image
docker build -f worker.Dockerfile -t worker-image .
Run Worker Image
docker run -d --name worker-container -v /var/run/docker.sock:/var/run/docker.sock worker-image
Stream Logs:
docker logs -f worker-container
Build Model Queue Image:
docker build -f modelq.Dockerfile -t modelq-image .
Run Model Queue Image:
docker run -d --name modelq-container modelq-image
Stream Logs:
docker logs -f modelq-container
Set .env variable:
POSTGRES_URL=xxxxxxxxxx
python voice_validation_api/validation_api.py
# Make a separate venv for the validator because of pydantic version conflict
uv venv .validator
source .validator/bin/activate
uv pip install -e .
python neurons/validator.py --wallet.name WALLET_NAME --wallet.hotkey WALLET_HOT_NAME --use-local-validation-api
# Run model queue to push models to validation api to be evaluated
python neurons/model_queue.py --use-local-validation-api
python voice_validation_api/worker_queue.py
Ensure you have a .env file in the root directory of your project. This file should include the necessary environment variables for the application to function correctly. Below is an example .env file:
# Admin credentials
ADMIN_KEY=example_admin_key
# Supabase credentials
SUPABASE_KEY=example_supabase_key
SUPABASE_URL=https://example.supabase.co
# Hugging Face credentials
HF_ACCESS_TOKEN=hf_example_access_token
HF_USER=ExampleUser
DIPPY_KEY=example_dippy_key
# OpenAI API Key
OPENAI_API_KEY=sk-example_openai_api_key
# Dataset API Key
DATASET_API_KEY=example_dataset_api_key
POSTGRES_URL=postgresql://vapi:vapi@localhost:5432/vapi # For local dev db spun up by docker
To quickly spin up the model_queue, worker_queue, and validation_api, use the local-compose script.
docker compose -f local-compose.yml up -d --build
To monitor logs for a specific container, use the following command, replacing "Container name to see logs" with the desired container's name:
docker logs -f "<Container name to see logs>"
ssh -L 5432:localhost:5432 <remote_server_ip>
If you prefer a graphical interface to inspect the database, such as entries in tables:
- Open DBeaver (or any other database client of your choice).
- Configure a new connection:
- Host: localhost
- Port: 5432
- Database name, username, and password as per the local configuration.
- Access and query the database contents as needed.
This is only required if you need a tool like DBeaver to examine database contents interactively.
-
The setup is configured to test netuid 231 and uses a local database provided by the Docker Compose file.
-
Ensure Docker Compose is installed and running on your system before executing the commands.
Models are evaluated on how closely their vocal outputs resemble natural human speech, considering factors such as emotional expression, intonation, pauses, and excitation levels. The more natural and convincingly human the voice sounds, the higher the score.
Models that produce human-like speech with high clarity and coherence will achieve the highest scores. Lower word error rates indicate clearer, more accurate speech output, enhancing the model's overall evaluation.
Our codebase is built upon Nous Research's and MyShell's Subnets.
The Dippy Bittensor subnet is released under the MIT License.