Temporal Logic Video (TLV) Dataset

Synthetic and real video dataset with temporal logic annotation
Explore the docs »

NSVS-TL Project Webpage · NSVS-TL Source Code

Overview

The Temporal Logic Video (TLV) Dataset addresses the scarcity of state-of-the-art video datasets for long-horizon, temporally extended activity and object detection. It comprises two main components:

Synthetic datasets: Generated by concatenating static images from established computer vision datasets (COCO and ImageNet), allowing for the introduction of a wide range of Temporal Logic (TL) specifications.
Real-world datasets: Based on open-source autonomous vehicle (AV) driving datasets, specifically NuScenes and Waymo.

Dataset Composition

Synthetic Datasets

Source: COCO and ImageNet
Purpose: Introduce artificial Temporal Logic specifications
Generation Method: Image stitching from static datasets

Real-world Datasets

Sources: NuScenes and Waymo
Purpose: Provide real-world autonomous vehicle scenarios
Annotation: Temporal Logic specifications added to existing data

Dataset

Though we provide a source code to generate datasets from different types of data sources, we release a dataset v1 as a proof of concept.

Dataset Structure

We provide a v1 dataset as a proof of concept. The data is offered as serialized objects, each containing a set of frames with annotations. You can download the dataset from our dataset repository in Hugging Face.

File Naming Convention

\<tlv_data_type\>:source:\<datasource\>-number_of_frames:\<number_of_frames\>-\<uuid\>.pkl

Object Attributes

Each serialized object contains the following attributes:

ground_truth: Boolean indicating whether the dataset contains ground truth labels
ltl_formula: Temporal logic formula applied to the dataset
proposition: A set of proposition for ltl_formula
number_of_frame: Total number of frames in the dataset
frames_of_interest: Frames of interest which satisfy the ltl_formula
labels_of_frames: Labels for each frame
images_of_frames: Image data for each frame

You can download a dataset from here. The structure of dataset is as follows: serializer

tlv-dataset-v1/
├── tlv_real_dataset/
├──── prop1Uprop2/
├──── (prop1&prop2)Uprop3/
├── tlv_synthetic_dataset/
├──── Fprop1/
├──── Gprop1/
├──── prop1&prop2/
├──── prop1Uprop2/
└──── (prop1&prop2)Uprop3/

Dataset Statistics

Total Number of Frames

Ground Truth TL Specifications	Synthetic TLV Dataset		Real TLV Dataset
	COCO	ImageNet	Waymo	Nuscenes
Eventually Event A	-	15,750	-	-
Always Event A	-	15,750	-	-
Event A And Event B	31,500	-	-	-
Event A Until Event B	15,750	15,750	8,736	19,808
(Event A And Event B) Until Event C	5,789	-	7,459	7,459

Total Number of datasets

Ground Truth TL Specifications	Synthetic TLV Dataset		Real TLV Dataset
	COCO	ImageNet	Waymo	Nuscenes
Eventually Event A	-	60	-	-
Always Event A	-	60	-	-
Event A And Event B	120	-	-	-
Event A Until Event B	60	60	45	494
(Event A And Event B) Until Event C	97	-	30	186

Installation

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip build
python -m pip install --editable ."[dev, test]"

Prerequisites

ImageNet (ILSVRC 2017):

ILSVRC/
├── Annotations/
├── Data/
├── ImageSets/
└── LOC_synset_mapping.txt

COCO (2017):

COCO/
└── 2017/
    ├── annotations/
    ├── train2017/
    └── val2017/

Usage

Detailed usage instructions for data loading and processing.

Data Loader Configuration

data_root_dir: Root directory of the dataset
mapping_to: Label mapping scheme (default: "coco")
save_dir: Output directory for processed data

Synthetic Data Generator Configuration

initial_number_of_frame: Starting frame count per video
max_number_frame: Maximum frame count per video
number_video_per_set_of_frame: Videos to generate per frame set
increase_rate: Frame count increment rate
ltl_logic: Temporal Logic specification (e.g., "F prop1", "G prop1")
save_images: Boolean flag for saving individual frames

Data Generation

COCO Synthetic Data Generation

python3 run_scripts/run_synthetic_tlv_coco.py --data_root_dir "../COCO/2017" --save_dir "<output_dir>"

ImageNet Synthetic Data Generation

python3 run_synthetic_tlv_imagenet.py --data_root_dir "../ILSVRC" --save_dir "<output_dir>"

Note: ImageNet generator does not support '&' LTL logic formulae.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Connect with Me

Feel free to connect with me through these professional channels:

Citation

If you find this repo useful, please cite our paper:

@inproceedings{Choi_2024_ECCV,
  author={Choi, Minkyu and Goel, Harsh and Omama, Mohammad and Yang, Yunhao and Shah, Sahil and Chinchali, Sandeep},
  title={Towards Neuro-Symbolic Video Understanding},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  month={September},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
images		images
run_scripts		run_scripts
tlv_dataset		tlv_dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Logic Video (TLV) Dataset