This project aims to develop an application that generates images and music based on user input using pre-trained deep learning models through an Application Programming Interface (API). The application will allow users to input text and the pre-trained deep learning models will generate a corresponding image and music track.
The system is divided into two parts: Getting the input from the user, Display the generated output to the user.
The music and image is generated using publicly available pre-trained models through Inference API from HuggingFace.
Image Generating Model: Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input
Music Generating Model: Riffusion is a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be converted into audio clips.
SAMPLE 1 : Input ‘Boat with sunrise’ and ‘Pleasant heavy metal’
SAMPLE 2 : Input ‘Sparrow in tree abstract’ and ‘fun disco’
- Download the repository or clone it locally
- Create a python virtual environment using
python -m venv
- Once Created Activate the virtual environment
- Install the packages from the requierments.txt
- Run the application using
flask --app aigen run