Skip to content

This project leverages VQGAN (Vector Quantized Generative Adversarial Networks) and CLIP (Contrastive Language-Image Pre-training) to create video frames from textual descriptions. By applying various image transformations, the system generates dynamic and artistic visual representations of text prompts, which can be compiled into engaging videos.

Notifications You must be signed in to change notification settings

SJ9VRF/Text-to-Video-3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-Video 3D

Screenshot_2024-08-13_at_12 12 20_AM-removebg-preview

Text-to-Video Synthesis with VQGAN and CLIP

This project combines the capabilities of VQGAN (Vector Quantized Generative Adversarial Networks) and CLIP (Contrastive Language-Image Pre-training) to generate video frames based on textual descriptions. The synthesis system applies a series of transformations to create dynamic, artistic representations of textual prompts.

Features

  • Text-to-image synthesis using VQGAN+CLIP.
  • Image transformations (zoom, rotate, translate) to enhance the visual dynamics.
  • Output frames saved as images which can be compiled into videos.

Prerequisites

  • Python 3.8 or higher
  • Pip package manager
  • Access to a CUDA-compatible GPU for faster processing (optional but recommended).

Installation

  1. Clone the repository
    git clone https://github.com/your-repository/Text-to-Video-VQGAN-CLIP.git
    cd Text-to-Video-VQGAN-CLIP
  2. Install dependencies
    pip install -r requirements.txt
  3. Install dependencies Download the necessary VQGAN model configuration and checkpoint files. These files should be placed in the appropriate directories specified in the main script.

Usage

Run the main script to generate frames based on predefined text prompts. Each frame is saved as an image in the output directory.

python src/main.py

You can modify the text prompts directly in the main.py file to create different images.

Extending the Project

Feel free to add more functionalities, such as:

  • Real-time text input for generating images on the fly.
  • Integration with web frameworks for an interactive user interface.
  • More complex transformations and effects.

About

This project leverages VQGAN (Vector Quantized Generative Adversarial Networks) and CLIP (Contrastive Language-Image Pre-training) to create video frames from textual descriptions. By applying various image transformations, the system generates dynamic and artistic visual representations of text prompts, which can be compiled into engaging videos.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages