The official Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn't use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone.
FILM: Frame Interpolation for Large Motion
Fitsum Reda1, Janne Kontkanen1, Eric Tabellion1, Deqing Sun1, Caroline Pantofaru1, Brian Curless1,2
1Google Research, 2University of Washington
In ECCV 2022.
FILM transforms near-duplicate photos into a slow motion footage that look like it is shot with a video camera.
Integrated into Hugging Face Spaces π€ using Gradio. Try out the Web Demo:
Try the interpolation model with the replicate web demo at
Try FILM to interpolate between two or more images with the PyTTI-Tools at
An alternative Colab for running FILM on arbitrarily more input images, not just on two images,
- Nov 28, 2022: Upgrade
eval.interpolator_cli
for high resolution frame interpolation.--block_height
and--block_width
determine the total number of patches (block_height*block_width
) to subdivide the input images. By default, both arguments are set to 1, and so no subdivision will be done. - Mar 12, 2022: Support for Windows, see WINDOWS_INSTALLATION.md.
- Mar 09, 2022: Support for high resolution frame interpolation. Set
--block_height
and--block_width
ineval.interpolator_test
to extract patches from the inputs, and reconstruct the interpolated frame from the iteratively interpolated patches.
- Get Frame Interpolation source codes
git clone https://github.com/google-research/frame-interpolation
cd frame-interpolation
- Optionally, pull the recommended Docker base image
docker pull gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest
-
If you do not use Docker, set up your NVIDIA GPU environment with:
-
Install frame interpolation dependencies
pip3 install -r requirements.txt
sudo apt-get install -y ffmpeg
See WINDOWS_INSTALLATION for Windows Support
- Create a directory where you can keep large files. Ideally, not in this directory.
mkdir -p <pretrained_models>
- Download pre-trained TF2 Saved Models from
google drive
and put into
<pretrained_models>
.
The downloaded folder should have the following structure:
<pretrained_models>/
βββ film_net/
β βββ L1/
β βββ Style/
β βββ VGG/
βββ vgg/
β βββ imagenet-vgg-verydeep-19.mat
The following instructions run the interpolator on the photos provided in 'frame-interpolation/photos'.
To generate an intermediate photo from the input near-duplicate photos, simply run:
python3 -m eval.interpolator_test \
--frame1 photos/one.png \
--frame2 photos/two.png \
--model_path <pretrained_models>/film_net/Style/saved_model \
--output_frame photos/output_middle.png
This will produce the sub-frame at t=0.5
and save as 'photos/output_middle.png'.
It takes in a set of directories identified by a glob (--pattern). Each directory
is expected to contain at least two input frames, with each contiguous frame
pair treated as an input to generate in-between frames. Frames should be named such that when sorted (naturally) with natsort
, their desired order is unchanged.
python3 -m eval.interpolator_cli \
--pattern "photos" \
--model_path <pretrained_models>/film_net/Style/saved_model \
--times_to_interpolate 6 \
--output_video
You will find the interpolated frames (including the input frames) in 'photos/interpolated_frames/', and the interpolated video at 'photos/interpolated.mp4'.
The number of frames is determined by --times_to_interpolate
, which controls
the number of times the frame interpolator is invoked. When the number of frames
in a directory is num_frames
, the number of output frames will be
(2^times_to_interpolate+1)*(num_frames-1)
.
We use Vimeo-90K as our main training dataset. For quantitative evaluations, we rely on commonly used benchmark datasets, specifically:
The training and benchmark evaluation scripts expect the frame triplets in the
TFRecord storage format.
We have included scripts that encode the relevant frame triplets into a
tf.train.Example
data format, and export to a TFRecord file.
You can use the commands python3 -m datasets.create_<dataset_name>_tfrecord --help
for more information.
For example, run the command below to create a TFRecord for the Middlebury-other
dataset. Download the images and point --input_dir
to the unzipped folder path.
python3 -m datasets.create_middlebury_tfrecord \
--input_dir=<root folder of middlebury-other> \
--output_tfrecord_filepath=<output tfrecord filepath> \
--num_shards=3
The above command will output a TFRecord file with 3 shards as <output tfrecord filepath>@3
.
Below are our training gin configuration files for the different loss function:
training/
βββ config/
β βββ film_net-L1.gin
β βββ film_net-VGG.gin
β βββ film_net-Style.gin
To launch a training, simply pass the configuration filepath to the desired
experiment.
By default, it uses all visible GPUs for training. To debug or train
on a CPU, append --mode cpu
.
python3 -m training.train \
--gin_config training/config/<config filename>.gin \
--base_folder <base folder for all training runs> \
--label <descriptive label for the run>
- When training finishes, the folder structure will look like this:
<base_folder>/
βββ <label>/
β βββ config.gin
β βββ eval/
β βββ train/
β βββ saved_model/
Optionally, to build a SavedModel format from a trained checkpoints folder, you can use this command:
python3 -m training.build_saved_model_cli \
--base_folder <base folder of training sessions> \
--label <the name of the run>
- By default, a SavedModel is created when the training loop ends, and it will be saved at
<base_folder>/<label>/saved_model
.
Below, we provided the evaluation gin configuration files for the benchmarks we have considered:
eval/
βββ config/
β βββ middlebury.gin
β βββ ucf101.gin
β βββ vimeo_90K.gin
β βββ xiph_2K.gin
β βββ xiph_4K.gin
To run an evaluation, simply pass the configuration file of the desired evaluation dataset.
If a GPU is visible, it runs on it.
python3 -m eval.eval_cli \
--gin_config eval/config/<eval_dataset>.gin \
--model_path <pretrained_models>/film_net/L1/saved_model
The above command will produce the PSNR and SSIM scores presented in the paper.
If you find this implementation useful in your works, please acknowledge it appropriately by citing:
@inproceedings{reda2022film,
title = {FILM: Frame Interpolation for Large Motion},
author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
@misc{film-tf,
title = {Tensorflow 2 Implementation of "FILM: Frame Interpolation for Large Motion"},
author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/google-research/frame-interpolation}}
}
We would like to thank Richard Tucker, Jason Lai and David Minnen. We would also like to thank Jamie Aspinall for the imagery included in this repository.
- 2 spaces for indentation
- 80 character line length
- PEP8 formatting
This is not an officially supported Google product.