Open-source Multimodal

Here's a comprehensive README.md file for you all to get started with this concept of Multimodality AI project

MVP Chatbot is an AI-powered chatbot capable of performing image recognition and text generation using the Replicate API which uses MIstral 7B under the hood. The chatbot interacts with users, processes images, and generates text responses based on user input.

Features

Image recognition
Image generation
Text generation
Interactive chat experience
Easy to set up and extend

Installation

Clone the repository:

git clone https://github.com/Kaif9999/Multimodal-AI
cd Mulimodal-AI

Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On Windows:
```
.\venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirements.txt
```

Configuration

Set up environment variables: Create a .env file in the root directory and add the following variables:

REPLICATE_API_KEY = <'Your Replicate API key'>
REPLICATE_TEXT_MODEL = yorickvp/llava-v1.6-mistral-7b
REPLICATE_TEXT_MODEL_VERSION = 19be067b589d0c46689ffa7cc3ff321447a441986a7694c01225973c2eafc874
REPLICATE_IMAGE_MODEL = stability-ai/sdxl
REPLICATE_IMAGE_MODEL_VERSION = 7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc

Usage

Run the chatbot:

Step 1
```
python main1.py
```
Step 2
```
chainlit run main1.py
```
Interacting with the chatbot:
- Start a chat and send messages.
- Upload images for recognition.

Project Structure

Multimodal-AI/
├── venv/                   # Virtual environment directory
├── .env                     # Environment Variables
├── .gitignore                    # gitignore file
├── main1.py                 # Main script to run the chatbot
├── requirements.txt        # Python dependencies
├── README.md               # Project documentation
├── chainlit.md             # Text that displays on the frontend of the Chatbot
└── test_main1.py           # Performs Unit Tests and Mock Test on main1.py file

Code Explanation

main.py

The main script initializes the chatbot, sets up the Replicate client, processes user messages, and handles image uploads.

Imports:

import time
import chainlit as cl
import replicate
import requests
from chainlit import user_session
from decouple import config

On chat start: Initializes message history and Replicate client.

@cl.on_chat_start
async def on_chat_start():
    message_history = []
    user_session.set("MESSAGE_HISTORY", message_history)
    api_token = config("REPLICATE_API_KEY")
    client = replicate.Client(api_token=api_token)
    user_session.set("REPLICATE_CLIENT", client)

Upload image: Handles image upload to Replicate.

def upload_image(image_path):
    upload_response = requests.post(
        "https://dreambooth-api-experimental.replicate.com/v1/upload/filename.png",
        headers={"Authorization": f"Token {config('REPLICATE_API_KEY')}"}
    ).json()
    file_binary = open(image_path, "rb").read()
    requests.put(upload_response["upload_url"], headers={'Content-Type': 'image/png'}, data=file_binary)
    return upload_response["serving_url"]

On message: Processes user messages, uploads images, and generates responses.

@cl.on_message
async def main(message: cl.Message):
    msg = cl.Message(content="", author="mvp assistant")
    await msg.send()

    images = [file for file in message.elements if "image" in file.mime]
    prompt = f"You are a helpful Assistant that can help me with image recognition and text generation.\n\nPrompt: {message.content}"

    message_history = user_session.get("MESSAGE_HISTORY")
    client = user_session.get("REPLICATE_CLIENT")

    if images:
        message_history = []
        url = upload_image(images[0].path)
        input_vision = {"image": url, "top_p": 1, "prompt": prompt, "max_tokens": 1024, "temperature": 0.6}
    else:
        input_vision = {"top_p": 1, "prompt": prompt, "max_tokens": 1024, "temperature": 0.5, "history": message_history}

    output = client.run(f"{config('REPLICATE_MODEL')}:{config('REPLICATE_MODEL_VERSION')}", input=input_vision)

    ai_message = ""
    for item in output:
        await msg.stream_token(item)
        time.sleep(0.1)
        ai_message += item
    await msg.send()

    message_history.append(f"User: {message.content}")
    message_history.append(f"Assistant: {ai_message}")
    user_session.set("MESSAGE_HISTORY", message_history)

Future Scope

Implementation of image generation function
To be able to give input in multiple languages, and get output in your desired language
Implementation of text to speech
Looking for integrating Gemini-Flash for image and text generation

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Make your changes.
Commit your changes:
```
git commit -m "Add feature"
```
Push to the branch:
```
git push origin feature-branch
```
Create a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-source Multimodal

Features

Table of Contents

Installation

Configuration

Usage

Step 1

Step 2

Project Structure

Code Explanation

main.py

Future Scope

Contributing

About

Releases 1

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.chainlit		.chainlit
.github/workflows		.github/workflows
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
chainlit.md		chainlit.md
main1.py		main1.py
requirements.txt		requirements.txt
test_main1.py		test_main1.py

Kaif9999/Multimodal-AI

Folders and files

Latest commit

History

Repository files navigation

Open-source Multimodal

Features

Table of Contents

Installation

Configuration

Usage

Step 1

Step 2

Project Structure

Code Explanation

main.py

Future Scope

Contributing

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages