RDIR: Recurrent Detect-Infer-Repeat

Official repository for RDIR: Capturing Temporally-Invariant Representations of Multiple Objects in Videos accepted at the Winter Conference on Applications of Computer Vision (WACV) 2024 (Workshops).

Code will be released soon.

Paper	Supplementary	Bibtex

Abstract

Learning temporally coherent representations of multiple objects in videos is crucial for understanding their complex dynamics and interactions over time. In this paper, we present a deep generative neural network, which can learn such representations by leveraging pretraining. Our model builds upon a scale-invariant structured autoencoder, extending it with a convolutional recurrent module to refine the learned representations through time and enable information sharing among multiple cells in multi-scale grids. This novel approach provides a framework for learning perobject representations from a pretrained object detection model, offering the ability to infer predefined types of objects, without the need for supervision. Through a series of experiments on benchmark datasets and real-life video footage, we demonstrate the spatial and temporal coherence of the learned representations, showcasing their applicability in downstream tasks such as object tracking. We analyze the method’s robustness by conducting an ablation study, and we compare it to other methods, highlighting the importance of the quality of objects’ representations.

Datasets

In this research we use the following datasets:

Moving multi-scale MNIST - an extension to the multiscalemnist dataset with randomized digit movement.
MoVi datasets - a fork of Google Research's Kubric repository with parsing to YOLO format.

Citation

@InProceedings{Zielinski_2024_WACV,
    author    = {Zieli\'nski, Piotr and Kajdanowicz, Tomasz},
    title     = {RDIR: Capturing Temporally-Invariant Representations of Multiple Objects in Videos},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {January},
    year      = {2024},
    pages     = {597-606}
}

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.dvc		.dvc
configs		configs
data		data
docker		docker
logs		logs
notebooks		notebooks
outputs		outputs
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
ruff.toml		ruff.toml
test.py		test.py
train.py		train.py
yolo.py		yolo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDIR: Recurrent Detect-Infer-Repeat

Abstract

Datasets

Citation

About

Languages

License

piotlinski/rdir

Folders and files

Latest commit

History

Repository files navigation

RDIR: Recurrent Detect-Infer-Repeat

Abstract

Datasets

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages