Official code for "StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields"
4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency
Please prepare the environment following robust-dynrf.
Tested with Pytorch 2.0/2.1 and CUDA 11.8. You can change the pytorch version depending on your local machines.
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia lpips tensorboard imageio easydict matplotlib scipy plyfile timm
Create dataset directory:
mkdir dataset
cd dataset
Download the pre-processed data by DynamicNeRF.
mkdir nvidia
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip
Download Davis dataset and put the images into dataset/davis/${SCENE_NAME}/images
.
Download DPT and RAFT pretrained weights.
mkdir weights
cd weights
wget --no-check-certificate https://github.com/intel-isl/DPT/releases/download/1_0/dpt_large-midas-2f21e586.pt
wget --no-check-certificate https://www.dropbox.com/s/4j4z58wuv8o0mfz/models.zip
unzip models.zip
rm models.zip
cd ..
Predict the monocular depth.
python preprocess_scripts/generate_DPT.py --dataset_path ${SCENE_DIR} --model weights/dpt_large-midas-2f21e586.pt
Predict the optical flows.
python preprocess_scripts/generate_flow.py --dataset_path ${SCENE_DIR} --model weights/models/raft-things.pth
Predict the motion mask.
python preprocess_scripts/generate_mask.py --dataset_path ${SCENE_DIR}
Download the style dataset and decompressed it into dataset/WikiArt
.
Check the example config files provided in configs/nvidia_with_pose/${CONFIG_FILE}
, i.e. configs/nvidia_with_pose/balloon1.txt
, configs/nvidia_with_pose/balloon2.txt
, configs/davis/bear.txt
and etc.
You can set the expname
in configs/nvidia_with_pose/${CONFIG_FILE}
.
Adjust the N_voxel_t
in configs/nvidia_with_pose/${CONFIG_FILE}
to match the number of images in the specified datadir
of configs/nvidia_with_pose/${CONFIG_FILE}
.
# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/balloon1.txt
# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/balloon2.txt
# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/nvidia_with_pose/playground.txt
# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/davis/bear.txt
# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train.py --config configs/davis/horsejump-high.txt
When training is finished, checkpoint file can be found in log/${expname}/${expname}.th
.
Before the training phase, you need to prepare vgg checkpoints in the pretrained
directory.
You can download the vgg checkpoints from here.
Unzip it into the pretrained
directory.
cd pretrained/
wget https://mogface.oss-cn-zhangjiakou.aliyuncs.com/xhb/share/styledyrf_tvcg/vgg_pretrained.zip
unzip vgg_pretrained.zip
rm vgg_pretrained.zip
Then going back to the root directory of the project and start the training.
# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
--config configs/nvidia_with_pose/balloon1.txt \
--patch_size 256 \
--basedir log_feature \
--n_iters 25000 \
--batch_size 8192 \
--ckpt log/Balloon1_with_pose/Balloon1_with_pose.th
# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
--config configs/nvidia_with_pose/balloon2.txt \
--patch_size 256 \
--basedir log_feature \
--n_iters 25000 \
--batch_size 8192 \
--ckpt log/Balloon2_with_pose/Balloon2_with_pose.th
# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
--config configs/nvidia_with_pose/playground.txt \
--patch_size 256 \
--basedir log_feature \
--n_iters 25000 \
--batch_size 8192 \
--ckpt log/Playground_with_pose/Playground_with_pose.th
# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
--config configs/davis/bear.txt \
--patch_size 256 \
--basedir log_feature \
--n_iters 25000 \
--batch_size 8192 \
--ckpt log/bear/bear.th
# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_feature.py \
--config configs/davis/horsejump-high.txt \
--patch_size 256 \
--basedir log_feature \
--n_iters 25000 \
--batch_size 8192 \
--ckpt log/horsejump-high/horsejump-high.th
--patch_size
is the rendered patch size during feature distillation. The larger the patch size the better the performance. However, a large patch size might result in memory overflow in GPU. You can adjust this hyperparameter based on your local machine and GPUs.--n_iters
is the number of iterations during feature distillation. You can use our default setting of 25000 or your custom settings.--batch_size
is the batch size of rays during feature distillation. Large batch size might require further GPU memories. You can adjust this hyperparameter based on your local machine and GPUs.--ckpt
is the path of pre-train dynamic NeRF in stage 1.--basedir
is the directory that will store the trained model in the feature distillation stage.
When training is finished, checkpoint file can be found in log_feature/${expname}/${expname}.th
.
# Training on the "balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
--config configs/nvidia_with_pose/balloon1.txt \
--patch_size 256 \
--basedir log_style \
--n_iters 25000 \
--batch_size 4096 \
--ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
--wikiartdir datasets/WikiArt
# Training on the "balloon2" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
--config configs/nvidia_with_pose/balloon2.txt \
--patch_size 256 \
--basedir log_style \
--n_iters 25000 \
--batch_size 4096 \
--ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
--wikiartdir datasets/WikiArt
# Training on the "playground" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
--config configs/nvidia_with_pose/playground.txt \
--patch_size 256 \
--basedir log_style \
--n_iters 25000 \
--batch_size 4096 \
--ckpt_feature log_feature/Balloon1_with_pose/Balloon1_with_pose.th \
--wikiartdir datasets/WikiArt
# Training on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
--config configs/davis/bear.txt \
--patch_size 256 \
--basedir log_style \
--n_iters 25000 \
--batch_size 4096 \
--ckpt_feature log_feature/bear/bear.th \
--wikiartdir datasets/WikiArt
# Training on the "horsejump-high" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python train_style.py \
--config configs/davis/horsejump-high.txt \
--patch_size 256 \
--basedir log_style \
--n_iters 25000 \
--batch_size 4096 \
--ckpt_feature log_feature/horsejump-high/horsejump-high.th \
--wikiartdir datasets/WikiArt
--patch_size
is the rendered patch size during feature distillation. The larger the patch size the better the performance. However, a large patch size might result in memory overflow in GPU. You can adjust this hyperparameter based on your local machine and GPUs.--n_iters
is the number of iterations during the training stage 3. You can use our default setting of 25000 or your custom settings.--batch_size
is the batch size of rays during the training stage 3. Large batch size might require further GPU memories. You can adjust this hyperparameter based on your local machine and GPUs.--ckpt_feature
is the path of the trained dynamic NeRF in stage 2 (Canonical Feature Distillation).--basedir
is the directory that will store the trained model in the canonical style transformation stage.--wikiartdir
is the path to the style dataset.
When training is finished, checkpoint file can be found in log_style/${expname}/${expname}.th
.
Put the style images for testing into datasets/style_imgs_test
# Test on the "Balloon1" scene of Nvidia dataset
CUDA_VISIBLE_DEVICES=0 python test_style.py \
--config configs/nvidia_with_pose/balloon1.txt \
--ckpt_style log_style/Balloon1_with_pose/Balloon1_with_pose.th \
--ckpt_matrix log_style/Balloon1_with_pose/Balloon1_with_pose_matrix.th \
--ckpt_spn log_style/Balloon1_with_pose/Balloon1_with_pose_spn.th \
--style_img_dir datasets/style_imgs_test \
--patch_size 256 \
--render_train 1 \
--cpu_percentage 0.5 \
--basedir log_style
# Test on the "bear" scene of davis dataset
CUDA_VISIBLE_DEVICES=0 python test_style.py \
--config configs/davis/bear.txt \
--ckpt_style log_style/bear/bear.th \
--ckpt_matrix log_style/bear/bear_matrix.th \
--ckpt_spn log_style/bear/bear_spn.th \
--style_img_dir datasets/style_imgs_test \
--patch_size 256 \
--render_train 1 \
--cpu_percentage 0.5 \
--basedir log_style
--style_img_dir
is the directory of style images for inference.--basedir
is the directory that stores the trained model in stage 3 (Canonical Style Transformation).
The stylized results can be found in log_style/${expname}/${expname}/style_transfer_results
For any questions related to our paper and implementation, please email hongbinxu1013@gmail.com.
Upload the basic code of StyleDyRF.Upload the training code of StyleDyRF on Nvidia dataset.Upload the training code of StyleDyRF on Davis dataset and other custom sequences.Upload the test code of StyleDyRF.
@article{xu2024styledyrf,
title={StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields},
author={Xu, Hongbin and Chen, Weitao and Xiao, Feng and Sun, Baigui and Kang, Wenxiong},
journal={arXiv preprint arXiv:2403.08310},
year={2024}
}
The code is available under the MIT license and draws from robust-dynrf, TensoRF, DynamicNeRF, and BARF, which are also licensed under the MIT license.
Licenses for these projects can be found in the licenses/
folder.