This is the official implementation of the NeurIPS 2023 paper: Language-driven Scene Synthesis using Multi-conditional Diffusion Model.
Overview.mp4
We highly recommand you to create a Conda environment to better manage all the Python packages needed.
conda create -n lsdm python=3.8
conda activate lsdm
After you create the environment, please install pytorch with CUDA. You can do it by running
conda install pytorch pytorch-cuda=11.6 -c pytorch -c nvidia
The other dependencies needed for this work is listed in the requirements.txt. We recommend the following commands to install dependencies:
pip install git+https://github.com/openai/CLIP.git
pip install transformers
conda install -c pytorch3d -c conda-forge pytorch3d=0.7.3
pip install -r requirements.txt
PRO-teXt is an extension of PROXD. Please visit their website to obtain the PROXD dataset first. We provide the extension of PRO-teXt as in the link. You also need to obtain HUMANISE via their project page. The dataset hierarchy should follow this template:
|- data/
|- protext
|- mesh_ds
|- objs
|- proxd_test
|- proxd_test_edit
|- proxd_train
|- proxd_valid
|- scenes
|- supp
|- proxd_train
|- proxd_valid
All model checkpoints that are used to benchmark in the paper are available at this link.
For visualization parts, we utilize a subset of 3D-Future dataset. The subset can be downloaded at the link, credited to the authors.
To train a baseline, use the following command:
python -m run.train_<baseline> \
--train_data_dir data/protext/proxd_train \
--valid_data_dir data/protext/proxd_valid \
--fix_ori \
--epochs 1000 \
--out_dir training \
--experiment <baseline>
For example, if you want to train LSDM, use the following command:
python -m run.train_sdm \
--train_data_dir data/protext/proxd_train \
--valid_data_dir data/protext/proxd_valid \
--fix_ori \
--epochs 1000 \
--out_dir training \
--experiment sdm
To test a baseline, use the following command:
python -m run.test_<baseline> data/protext/proxd_test/ \
--load_model training/<baseline>/model_ckpt/best_model_cfd.pt \
--model_name <baseline> \
--fix_ori --test_on_valid_set \
--output_dir training/<baseline>/output
For example, you can use:
python -m run.test_sdm data/protext/proxd_test/ \
--load_model training/sdm/model_ckpt/best_model_cfd.pt \
--model_name sdm \
--fix_ori \
--test_on_valid_set \
--output_dir training/sdm/output
to test an LSDM checkpoint. Note that, you can also train on HUMANISE dataset. Just replace the path of data/protext/proxd_train
by data/humanise/train
.
To generate a video sequence as in our paper, you can proceed by using the following steps:
- Step 1: Generate objects from contact points
python fit_custom_obj.py \
--sequence_name <sequence_name> \
--vertices_path data/supp/proxd_valid/vertices/<sequence_name>_verts.npy \
--contact_labels_path data/supp/proxd_valid/semantics/<sequence_name>_cfs.npy \
--output_dir fitting_results/<baseline> \
--label <num_label> \
--file_name training/sdm/output/predictions/<interaction_name>.npy
where sequence_name
is the name of the human motion and interaction_name
is the name of the human_pose. Note that, we name human pose very closely to its corresponding human motion. In addition, num_label
can be found in the fild mpcat40.tsv
. For example, you can use the following command:
- Step 2: Visualization
python vis_fitting_results.py \
--fitting_results_path fitting_results/<baseline>/<sequence_name>/ \
--vertices_path data/supp/proxd_valid/vertices/<sequence_name>_verts.npy
For example,
python vis_fitting_results.py \
--fitting_results_path fitting_results/sdm/N0Sofa_00034_02/ \
--vertices_path data/supp/proxd_valid/vertices/N0Sofa_00034_02_verts.npy
The script ran above will save rendered frames in fitting_results/N0Sofa_00034_02/rendering
.
Note that you need a screen to run this command. In case you are testing the project on a server which doesn't have a display service, you can still load the saved objects and human meshes and use other approaches to visualize them. To get the human meshes, you can still run the above command and wait until the program automatically exits. The script will save the human meshes of your specified motion sequence in fitting_results/<sequence name>/human/mesh
.
Best fitting objects are stored in fitting_results/<sequence name>/fit_best_obj/<object category>/<object index>/<best_obj_id>/opt_best.obj
. As mentioned before, you can get <best_obj_id>
in fitting_results/<sequence name>/fit_best_obj/<object category>/<object index>/best_obj_id.json
.
Part of our codebase is based on Ye et al.. If you find this work helpful, please consider citing:
@article{vuong2023language,
title={Language-driven Scene Synthesis using Multi-conditional Diffusion Model},
author={Vuong, An Dinh and Vu, Minh Nhat and Nguyen, Toan and Huang, Baoru and Nguyen, Dzung and Vo, Thieu and Nguyen, Anh},
journal={arXiv preprint arXiv:2310.15948},
year={2023}
}