This repository contains a Python script for an AI assistant capable of generating text, speech, image, and video responses based on user input. The assistant utilizes OpenAI's GPT-3.5 model for text generation, the Google Text-to-Speech (gTTS) library for speech synthesis, integrates with the ClipDrop API for generating images, and utilizes the Hugging Face Transformers library for video generation from text prompts.
Text Response: Users can receive textual responses generated by the GPT-3.5 model based on their input queries. Speech Response: Users have the option to receive spoken responses synthesized from the generated text using gTTS. Image Response: Users can request image responses derived from their input prompts, facilitated by the ClipDrop API. Video Response: The assistant can also provide video responses based on user prompts, utilizing the Hugging Face Transformers library for video generation. Dependencies: OpenAI API: The openai library is used to interact with the GPT-3.5 model for text generation. gTTS: Google Text-to-Speech library for converting text responses into speech. Requests: Used for making HTTP requests to the ClipDrop API for generating image responses. Hugging Face Transformers: Utilized for generating video responses based on text prompts. IPython: Required for displaying audio files within Jupyter Notebooks.
Ensure the following dependencies are installed before running the script:
pip install openai==0.27.0 gtts
pip install requests
For speech-related functionalities, additional installations may be required:
pip install pydub SpeechRecognition
apt-get install -y python3-pyaudio
!apt -y install -qq aria2
!pip install -q torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 torchtext==0.14.1 torchdata==0.5.1 --extra-index-url https://download.pytorch.org/whl/cu116 -U
!pip install -q pandas-gbq==0.18.1 open_clip_torch pytorch_lightning
!pip install -q git+https://github.com/camenduru/modelscope
Download required models:
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/VQGAN_autoencoder.pth -d /content/models -o VQGAN_autoencoder.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/open_clip_pytorch_model.bin -d /content/models -o open_clip_pytorch_model.bin
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/resolve/main/text2video_pytorch_model.pth -d /content/models -o text2video_pytorch_model.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/raw/main/configuration.json -d /content/models -o configuration.json
Set configuration for GPU mode:
Contributions to this project are welcome! If you have ideas for improving the chatbot's functionality, adding new features, or enhancing its performance, feel free to submit a pull request.
This project is licensed under the MIT License.