Skip to content

Latest commit

 

History

History
127 lines (87 loc) · 3.5 KB

README.md

File metadata and controls

127 lines (87 loc) · 3.5 KB

poster

A multimodal expert assistant GPT platform built using RAG+agent. It integrates tools for modalities such as text, images, and audio. Support local deployment and private database construction.

Code Maintenance PR's Welcome license

project_display.mp4

💡 1 RoadMap

1 Basic Function

  • Single/multi turn chat
  • Multimodal information display and interaction
  • Agent
  • Tools
    • Web searching
    • Image generation
    • Image caption
    • audio-to-text
    • text-to-audio
    • Video caption
  • RAG
    • Private database
    • Offline deployment

2 Supporting Information Modality

  • text
  • image
  • audio
  • video

3 Model Interface API

  • ChatGPT
  • Dalle
  • Google-Search
  • BLIP

👨‍💻 2 Development

Project technology stack: Python + torch + langchain + gradio

⚡ 2.1 Installation

  1. Create a virtual environment in Anaconda:
conda create -n agent python=3.10
  1. Enter the virtual environment and Install related dependency packages:
conda activate agent
pip install -r ./requirements.txt
  1. Install the BLIP model locally, open the BLIP website, and download all files to Models/BLIP.

  2. Follow the prompts to configure the key for the API that needs to be used in the .env.

💻 2.2 Demo

Multi Agent GPT provides UI interface interaction, allowing users to launch agents and achieve intelligent conversations by running the web.py:

python ./web.py

The program will run a local URL: http://XXX. Open using a local browser to see the UI interface:

demo

📻 2.3 News

1 Chat_with_Image

By integrating the BLIP model, agents can understand image information and provide high-quality dialogue information.

🗄️ 3 Structure

- .env
- Agents/
  - openai_agents.py  #用来定义基于gpt3.5的agent
- Database/
- Docs/
- Imgs/
  - Show/                #存储一些示例图片
- Models
  - BLIP                 #图像理解大模型
- Tools/
  - ImageCaption.py      #基于BLIP的图像理解工具
  - ImageGeneration.py  #定义了一个基于openai dalle的文本生成图像的工具
  - search.py            #基于Google-search的联网搜索工具
- Utils/
  - data_io.py
  - stdio.py            #实现了如何截获当前程序的日志信息,主要是用来获取agent的verbose信息
  - utils_image.py      #关于图像处理的一些功能函数
  - utils_json.py       #从已有的log日志信息中提取相关的有用字段(服务stdio) 
- python_new_funciton.py #开发过程中的测试文件
- readme.md
- requirements.txt
- web.py                 #主运行文件