Grounded SAM 2 for streaming video tracking using natural language queries.
This system is comprised of three components:
- LLM: This module is responsible for parsing the input query or inferring the intended object.
- GroundingDINO: This component handles object referencing.
- SAM-2: This part specializes in object tracking.
- Prepare environments
conda create -n sam2 python=3.10 -y
conda activate sam2
pip install -e .
- Download SAM 2 checkpoints
cd checkpoints
./download_ckpts.sh
- Download Grounding DINO checkpoints
cd gdino_checkpoints
./download_ckpts.sh
or huggingface version (recommend)
cd gdino_checkpoints
huggingface-cli download IDEA-Research/grounding-dino-tiny --local-dir grounding-dino-tiny
- Download LLM
4.1 GPT4-o (recommend)
cd llm
touch .env
past your API_KEY or API_BASE (Azure only)
API_KEY="xxx"
API_BASE = "xxx"
4.2 Qwen2
cd llm_checkpoints
huggingface-cli download Qwen/Qwen2-7B-Instruct-AWQ --local-dir Qwen2-7B-Instruct-AWQ
install the corresponding packages
Step-1: Check available camera
python cam_detect.py
If a camera is detected, modify it in demo.py
.
Step-2: run demo
currently available model: Qwen2-7B-Instruct-AWQ
, gpt-4o-2024-05-13
python demo.py --model gpt-4o-2024-05-13