Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2)

This project provides a video processing tool that utilizes advanced AI models, specifically Florence2 and SAM2, to detect and segment specific objects or activities in a video based on textual descriptions. The system identifies significant motion in video frames and then performs deep learning inference to locate objects or actions described by the user's textual input.

Installation

Before running the script, ensure that all dependencies are installed. You can install the necessary packages using the following command:

pip install -r requirements.txt

For checkpoints:

cd checkpoints
./download_ckpts.sh
cd ..

Requirements

Python 3.7+
OpenCV
PIL
Torch
tqdm and

pip install -q einops spaces timm transformers samv2 gradio supervision opencv-python

Usage

The script can be executed from the command line with arguments to specify the paths of the input video, output video, and mask video, along with the text input for processing.

python main.py --input_video_path <path_to_input_video> --output_video_path <path_to_output_video> --mask_video_path <path_to_mask_video> --text_input "your text here"

Parameters

--input_video_path: Path to the source video file.
--output_video_path: Path to save the processed video file.
--mask_video_path: Path to save the mask video file that highlights detected objects.
--text_input: Textual description of the object or activity to detect and segment in the video.

Features

Motion Detection: Detect significant motions in the video to focus processing on relevant segments.
Object and Action Detection: Utilize state-of-the-art models (Florence2 and SAM2) to detect and segment objects or actions specified by the user.
Video and Mask Output: Generate an annotated video and a corresponding mask video showing the detected segments.

To Do

WebUI
Robust Video Synopsis
More Features

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
checkpoints		checkpoints
configs		configs
utils		utils
vid_src		vid_src
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
single-image.py		single-image.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2)

Installation

Requirements

Usage

Parameters

Features

To Do

Related work

About

Releases

Packages

Languages

Mithunprb/text2segment_video

Folders and files

Latest commit

History

Repository files navigation

Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2)

Installation

Requirements

Usage

Parameters

Features

To Do

Related work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages