Skip to content

Latest commit

 

History

History
282 lines (148 loc) · 14.1 KB

README_en.md

File metadata and controls

282 lines (148 loc) · 14.1 KB

Visual-Auditory Fusion Perception AI Platform

Project Page   License   Demo Website  

📕 中文版 README | 📗 English README

📻 Installation Guide

Before using our model, you need to ensure that all necessary dependencies are installed in your environment. These dependencies cover various libraries and tools required for the model's operation, ensuring a smooth inference process.

Please follow these steps for the installation:

  1. Open the Terminal or Command Prompt: Depending on your operating system, open the corresponding command-line interface.
  2. Install dependencies using pip: Enter the following command to install the required Python packages and libraries using pip.
pip install -r requirements.txt

🚀 Inference Guide

After installing all the necessary dependencies, you can start using our model for inference. We provide two ways of performing inference: using the terminal and using the interactive inference.

Here, we will use the example image asserts/demo.jpg for illustration:

1. Inference using the Terminal

If you want to directly run the inference script in the terminal, you can use the following command:

python chatme.py --image asserts/demo.jpg --question "How many apples are there on the shelf?"

This command will load the pre-trained model and perform inference using the provided image (demo.jpg) and question ("How many apples are there on the shelf?").

The model will analyze the image and attempt to answer the question. The inference result will be output to the terminal in text form, for example:

Xiaochuan: There are three apples on the shelf.

2. Interactive Inference

In addition to using the terminal for inference, you can also use the interactive inference feature to interact with the large model in real-time. To start the interactive terminal, run the following command:

python main.py

This command will launch an interactive terminal that waits for you to enter the image path. You can type the image path (e.g., asserts/demo.jpg) in the terminal and press Enter.

The model will perform inference based on the provided image and wait for you to enter a question.

Once you enter a question (e.g., "How many apples are there on the shelf?"), the model will analyze the image and attempt to answer it. The inference result will be output to the terminal in text form, for example:

Image Path >>>>> asserts/demo.jpg
User: How many apples are there on the shelf?
Xiaochuan: There are three apples on the shelf.

Using this approach, you can easily interact with the model and ask it various questions.

🧾 References

📈 Benchmark

📷 Visual Perception

🎧 Audio

💬 NLP

🔮 Multi-Modal

🤖 Robotic