This repository contains the official implementation of the paper:
Perspective Plane Program Induction from a Single Image
Yikai Li*,
Jiayuan Mao*,
Xiuming Zhang,
William T. Freeman,
Joshua B. Tenenbaum, and
Jiajun Wu
In Computer Vision and Pattern Recognition (CVPR) 2020
[Paper]
[Project Page]
[BibTex]
@inproceedings{Li2020Perspective,
title={{Perspective Plane Program Induction from a Single Image}},
author={Li, Yikai and Mao, Jiayuan and Zhang, Xiuming and Freeman, William T. and Tenenbaum, Joshua B. and Wu, Jiajun},
booktitle={Conference on Computer Vision and Pattern Recognition},
year={2020}
}
P3I is a conceptually simple yet effective algorithm for inducing neuro-symbolic, program-like representation from a single image.
For the ease of reproducibility, you are suggested to install miniconda (or anaconda if you prefer) before executing the following commands.
git clone https://github.com/42x00/p3i
cd p3i
conda create -y -n p3i
source activate p3i
# Replace cudatoolkit=10.1 with your CUDA version: https://pytorch.org/
conda install -y pytorch cudatoolkit=10.1 -c pytorch
conda install -y pillow opencv
To quickly test P3I on a given image, you can execute
python demo.py --input assets/demo.png
The induced program and rectified image will be saved to the results folder.
You can download our reference pre-trained models from Google Drive. This model was trained on ImageNet by Krizhevsky et al. 2012. We use it to extract visual features from the given image to make the inference procedure more robust.
To perform P3I on your own images, you need to execute
python demo.py --device 0 --model_ckpt <path-to-pretrained-model> --input <path-to-image> --output_dir <path-to-output>
Here, --device 0
is specifying the GPU ID used for induction, and you can skip it to use CPU.