Code release for the paper Simulating Fluids in Real-World Still Images
Note: I'm sorry that the plan for releasing rest of the codes(CLAWv2 testset, Inference without hint , UI for hint editing and SFS) is suspend. For people want to animate your own single image: Currently the sparse hints and masks in our dataset are sampled from ground-truth motion, and ground truth motion is inferenced from ground-truth video using flownet2. If you want to animate your own single image data, you can refer to #3 (comment)_: First use LABELME to generate mask, and then edit one to five pixels of motion speed and direction in code. If you want to animate without any hint, you have to train your own motion model.(We tried but the motion result is worse than the one using hints)
Authors: Siming Fan, Jingtan Piao, Chen Qian, Kwan-Yee Lin, Hongsheng Li.
[Paper] [Project Page] [Demo Video]
In this work, we tackle the problem of real-world fluid animation from a still image. We propose a new and learnable representation, surface-based layered representation(SLR), which decomposes the fluid and the static objects in the scene, to better synthesize the animated videos from a single fluid image. and design a surface-only fluid simulation(SFS) to model the evolution of the image fluids with better visual effects.
For more details of SLR-SFS, please refer to our paper and project page.
- [14/07/2023] Our paper has been accepted by ICCV 2023!
- [10/06/2022] Code, pretrained model of Motion Regressor from single image and sparse hint are updated.
- [04/05/2022] Colab updated. Huggingface will be updated soon.
- [26/04/2022] Technical report, code, CLAW testset released.
- [01/04/2022] Project page is created.
We prepare a Colab demo to allow you to synthesize videos under gt motion, as well as editing effects. Motion regressor from single image is not supported in this version for the time being and will be updated soon.
Download our CLAW test set(314.8MB) (Description) and put it to SLR-SFS/data/CLAW/test
Download eulerian_data(Everything, 42.9GB) (Description)
Generate mean video of ground truth for background training.(Need 4.8GB)
cd data
python average_gt_video.py
Download our label(1MB) (Description) and put it to SLR-SFS/eulerian_data/fluid_region_rock_labels/all
.Make sure opt.rock_label_data_path = "data/eulerian_data/fluid_region_rock_labels/all" in options/options.py
contains label file, scene w/o file is considered to have no rock in moving region. You can check tensorboard.
SLR-SFS
├── data
│ ├── eulerian_data
│ │ ├── train
│ │ ├── validation
│ │ ├── imageset_shallow.npy # list of videos containing transparent fluid in train/*.gt.mp4
│ │ │── align*.json # speed align information for gt motion in validation/*_motion.pth to avoid large invalid pixels after warping
│ │ ├── avr_image # containing mean image of each video in train/*.gt.mp4
│ ├── eulerian_data.py #(for baseline training)
│ ├── eulerian_data_balanced1_mask.py #(for SLR training)
│ ├── eulerian_data_bg.py #(for BG training)
│ ├── CLAW # mentioned in paper
│ │ ├── test
│ │ ├── align*.json
│ ├── CLAWv2 # new collected test set with higher resolution and more diverse scenes(containing fountains, oceans, beaches, mists, etc., which is not included in CLAW_data)
│ │ ├── test
│ │ ├── align*.json
conda create -n SLR python=3.9 -y
conda activate SLR
conda install -c pytorch pytorch==1.10.0 torchvision #pytorch/linux-64::pytorch-1.10.0-py3.9_cuda11.3_cudnn8.2.0_0
pip install tqdm opencv-python py-lz4framed matplotlib
conda install cupy -c conda-forge
pip install lpips# for evaluation
pip install av tensorboardX tensorboard # for training
Download our pretrained model mentioned in Table 1,2 of the paper:
Model | LPIPS of CLAW(All;Fluid) | Description |
---|---|---|
baseline2 | 0.2078;0.2041 | Modified Holynski(Baseline): 100epochs + (lower learning rate)50epochs |
Ours_stage_1 | 0.2143;0.2100 | Ours(Stage 1): 100epochs |
Ours_stage_2 | 0.2411;0.2294 | Background Only, used in Ours(Stage 2): 100epochs, and used as background initialization of the stage 3 training of Ours_v1 |
Ours_v1 | 0.2040;0.1975 | Ours: 100epochs baseline2(stage 1) + 100epochs BG(stage 2) + (lower learning rate)50 epochs Ours(stage 3) |
Ours_v1_ProjectPage | 0.2060;0.1992 | Selected with the best TotalLoss(Perctual Loss, MaskLoss mainly) of eulerian_data validation set, while the previous models are selected with the best Perceptual Loss. This pretrained model can be used to reproduce the results in our Project Page. Decomposition results is a little better than "Ours_v1" |
1.For evaluation under aligned gt motion:
# For our v1 model, 60 frames, gt motion(Table 1,2 in the paper)
bash test_animating/CLAW/test_v1.sh
bash evaluation/eval_animating_CLAW.sh
# For baseline2 model, 60 frames, gt motion(Table 1,2 in the paper)
bash test_animating/CLAW/test_baseline2.sh
bash evaluation/eval_animating_CLAW.sh
## You can also use sbatch script test_animating/test_sbatch_2.sh
## For eulerian_data validation set, use the script in test_animating/eulerian_data
2.You can also use aligned gt motion to avoid large holes for better animation:
bash test_animating/CLAW/test_v1_align.sh
Results will be the same as:
3.Run with smaller resolution by replacing 768 to 256 in test_animating/CLAW/test_v1_align.sh, etc.
Model | LPIPS of CLAW(All;Fluid) | Description |
---|---|---|
motion2 | - | Controllable-Motion: Ep200 |
baseline2+motion2 | Ongoing | Modified Holynski(Baseline): 100epochs + Controllable-Motion: Ep200 |
baseline2+motion2+fixedMotionFinetune | Ongoing | Modified Holynski(Baseline): 100epochs + Controllable-Motion: Ep200 + Fixed Motion and finetune fluid: Ep 50 |
For evaluation under 5 sparse hint from and mask from gt motion:
bash test_animating/CLAW/test_baseline_motion.sh
bash evaluation/eval_animating_CLAW.sh
1.To train baseline model under gt motion, run the following scripts
# For baseline training
bash train_animating_scripts/train_baseline1.sh
# For baseline2 training (w/ pconv)
bash train_animating_scripts/train_baseline2_pconv.sh
Note: Please refer to "Animating Pictures with Eulerian Motion Fields" for More information.
2.To train our SLR model under gt motion, run the following scripts
# Firstly, train Surface Fluid Layer for 100 epochs
bash train_animating_scripts/train_baseline2_pconv.sh
# Secondly, generate "mean video" and train Background Layer for 100 epochs
bash train_animating_scripts/train_bg.sh
# Lastly, unzip the label file to proper directory, train alpha, finetune Fluid and BG.
bash train_alpha_finetuneBG_finetuneFluid_v1.sh
(Notice: check tensorboard to see whether your groundtruth alpha is right)
3.To train motion , run the following scripts
# For controllable motion training with motion GAN
bash train_animating_scripts/train_motion_scripts/train_motion_EPE_MotionGAN.sh
Note: Please refer to "Controllable Animation of Fluid Elements in Still Images" for More information.
4.To finetune baseline model , run the following scripts
# First, run the stage 1 to train baseline fluid
bash train_animating_scripts/train_baseline2_pconv.sh
# Second, run the stage 2 to train motion
bash train_animating_scripts/train_motion_scripts/train_motion_EPE_MotionGAN.sh
# Finally, fixed motion estimation and finetune fluid
bash train_animating_scripts/train_animating_fixedMotion_finetuneFluid_IGANonly.sh
You can use tensorboard to check the training in the logging directory.
- pretrained model and code of our reproduced Holynski's method(w/o motion estimation)
- pretrained model and code of SLR
- pretrained model and code of motion estimation from single image
- CLAWv2 testset
- Simple UI for Motion Editing
- code of SFS
If you find this work useful in your research, please consider cite:
@article{fan2022SLR,
author = {Siming Fan, Jingtan Piao, Chen Qian, Kwan-Yee Lin, Hongsheng Li},
title = {Simulating Fluids in Real-World Still Images},
journal = {arXiv preprint},
volume = {arXiv:2204.11335},
year = {2022},
}