Before using our model, you need to ensure that all necessary dependencies are installed in your environment. These dependencies cover various libraries and tools required for the model's operation, ensuring a smooth inference process.
Please follow these steps for the installation:
- Open the Terminal or Command Prompt: Depending on your operating system, open the corresponding command-line interface.
- Install dependencies using pip: Enter the following command to install the required Python packages and libraries using pip.
pip install -r requirements.txt
After installing all the necessary dependencies, you can start using our model for inference. We provide two ways of performing inference: using the terminal and using the interactive inference.
Here, we will use the example image asserts/demo.jpg
for illustration:
If you want to directly run the inference script in the terminal, you can use the following command:
python chatme.py --image asserts/demo.jpg --question "How many apples are there on the shelf?"
This command will load the pre-trained model and perform inference using the provided image (demo.jpg
) and question ("How many apples are there on the shelf?"
).
The model will analyze the image and attempt to answer the question. The inference result will be output to the terminal in text form, for example:
Xiaochuan: There are three apples on the shelf.
In addition to using the terminal for inference, you can also use the interactive inference feature to interact with the large model in real-time. To start the interactive terminal, run the following command:
python main.py
This command will launch an interactive terminal that waits for you to enter the image path. You can type the image path (e.g., asserts/demo.jpg
) in the terminal and press Enter.
The model will perform inference based on the provided image and wait for you to enter a question.
Once you enter a question (e.g., "How many apples are there on the shelf?"
), the model will analyze the image and attempt to answer it. The inference result will be output to the terminal in text form, for example:
Image Path >>>>> asserts/demo.jpg
User: How many apples are there on the shelf?
Xiaochuan: There are three apples on the shelf.
Using this approach, you can easily interact with the model and ask it various questions.
-
-
Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization
-
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering
-
Distillation Using Oracle Queries for Transformer-based Human-Object Interaction Detection
-
Intra- and Inter-Slice Contrastive Learning for Point Supervised OCT Fluid Segmentation
-
Partitioning Stateful Data Stream Applications in Dynamic Edge Cloud Environments
-
Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution
-
Graph Convolutional Networks for Temporal Action Localization
-
NAT: Neural Architecture Transformer for Accurate and Compact Architectures
-
Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search
-
Contrastive Neural Architecture Search with Neural Architecture Comparators
-
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
-
Source-free Domain Adaptation via Avatar Prototype Generation and Adaptation
-
Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification
-
Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score
-
Masked Motion Encoding for Self-Supervised Video Representation Learning
-
Source-free Domain Adaptation via Avatar Prototype Generation and Adaptation
-
Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation
-
Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection
-
Polysemy Deciphering Network for Human-Object Interaction Detection
-
Bidirectional Posture-Appearance Interaction Network for Driver Behavior Recognition
-
Improving Generative Adversarial Networks with Local Coordinate Coding
-
-
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
-
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
-
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
-
Deep Multi-View Learning Using Neuron-Wise Correlation-Maximizing Regularizers
-
Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation
-
Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation
-
Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap
-
-
Test-Time Model Adaptation for Visual Question Answering with Debiased Self-Supervisions
-
Debiased Visual Question Answering from Feature and Sample Perspectives
-
Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only
-
Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization
-
Cascade Reasoning Network for Text-based Visual Question Answering