A micro service for running Suno's Bark model for text to speech generation. Built to interface with the acai.so AI toolkit for powering text to speech capabilities from your own compute.
You can run this via API for any UI you'd like to build out
You will need the following tools:
- Python 3.9
- pip (Python package installer)
- Miniconda (Recommended)
Clone the repository
git clone bark-service
Navigate to the project directory
cd bark-service
We recommend using Miniconda to create an isolated Python environment for this project. If you haven't installed Miniconda yet, you can download it from here.
Create a new conda environment with Python 3.9
conda create -n myenv python=3.9
Activate the conda environment
conda activate myenv
Install the required packages using pip
pip install --no-cache-dir -r requirements.txt
This project includes a docker-compose.yml
file that can be used to create a Docker container for the application. Here are the steps to use Docker Compose:
You will need the following tools:
- Docker
- Docker Compose
Clone the repository
git clone <repository-url>
Navigate to the project directory
cd /path/to/project
Build and run the Docker container using Docker Compose
docker-compose up --build
The application will start running at http://0.0.0.0:5000
To stop the application, press Ctrl+C
. To remove the Docker container, run docker-compose down
.
Run the application with the following command:
python -m uvicorn main:app --host 0.0.0.0 --port 5000
The application will start running at http://0.0.0.0:5000
The suno bark model will begin downloading from Huggingface when the server is started if you don't have them cached
Once downloaded they will load into GPU memory. The suno/bark
model uses ~5.5gb of VRAM while the suno/bark-small
uses ~2.25gb.
We recommend the standard model for best quality, but if you'd like to try the small model then you can adjust the model name in main.py
and restart the server
This application provides an API endpoint that you can use to run inference and get an audio response. Here is how you can call it:
This endpoint accepts a JSON object with the following properties:
text
(string): The text to be converted to speech.voice
(string, optional): The voice preset to be used in the speech synthesis
The endpoint returns an audio file in WAV format.
Here is an example of how to call this endpoint using Python's requests
library:
import requests
import json
url = "http://0.0.0.0:5000/bark-inference"
data = {
"text": "Hello, world!",
"voice": "en-US"
}
headers = {'Content-Type': 'application/json'}
response = requests.post(url, data=json.dumps(data), headers=headers)
# The response will be an audio file in WAV format.
with open('output.wav', 'wb') as out_file:
out_file.write(response.content)
You can then play the output.wav
file to hear the synthesized speech.
You can use the axios
package in Node.js to make a POST request to the API:
const axios = require('axios');
const fs = require('fs');
const data = {
text: 'Hello, world!',
voice: 'en-US'
};
axios.post('http://0.0.0.0:5000/bark-inference', data, { responseType: 'arraybuffer' })
.then((response) => {
fs.writeFileSync('output.wav', response.data);
})
.catch((error) => {
console.error(error);
});
You can use curl
in the command line to make a POST request to the API:
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello, world!","voice":"en-US"}' http://0.0.0.0:5000/bark-inference --output output.wav
This will save the response as a WAV file named output.wav
.
This project is licensed under the MIT License.