This code was tested on NVIDIA A100
and requires:
- conda3 or miniconda3
- python 3.8+
- pytorch 3.10+
a. Create a conda virtual environment and activate it.
conda create -n stablemofusion python=3.8 -y
conda activate stablemofusion
b. Install PyTorch 1.10.0 following the official instructions.
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
Important: Make sure that your compilation CUDA version and runtime CUDA version match.
c. Install other requirements
pip install -r requirements.txt
d. Install ffmpeg for visualization
conda install ffmpeg x264=20131218 -c conda-forge
e. Modify the LayerNorm
module in clip for fp16 inference
# miniconda3/envs/stablemofusion/lib/python3.8/site-packages/clip/model.py
class LayerNorm(nn.LayerNorm):
"""Subclass torch's LayerNorm to handle fp16."""
def forward(self, x: torch.Tensor):
if self.weight.dtype==torch.float32:
orig_type = x.dtype
ret = super().forward(x.type(torch.float32))
return ret.type(orig_type)
else:
return super().forward(x)
- Download pre-trained models from Google Cloud and put them into ./ckeckpoints/ and arrange them in the following file structure:
StableMoFusion
├── checkpoints
│ └── kit
│ └── kit_condunet1d_batch64
│ ├── meta
│ │ ├── mean.npy
│ │ └── std.npy
│ ├── model
│ │ └── latest.tar
│ └── opt.txt
│ └── t2m
│ └── t2m_condunet1d_batch64
│ ├── meta
│ │ ├── mean.npy
│ │ └── std.npy
│ ├── model
│ │ └── latest.tar
│ └── opt.txt
│ └── footskate
│ ├── underpressure_pretrained.tar
│ └── t2m_pretrained.tar
- Download the UnderPressure code and put them into ./UnderPressure/ like:
StableMoFusion
├── UnderPressure
│ ├── dataset
│ | |── S1_HoppingLeftFootRightFoot.pth
│ | └── ...
│ ├── anim.py
│ ├── data.py
│ ├── demo.py
│ └── ...
- Updating import paths within
./Underpressure/*.py
. To ensure modules within the ./Underpressure/ can be imported and utilized seamlessly via python -m, it's necessary to update the import paths within the Python files located in ./Underpressure/*.py. For example:
- Replace
import util
withfrom Underpressure import util
in UnderPressure/anim.py - Replace
import anim, metrics, models, util
withfrom UnderPressure import anim, metrics, models, util
in UnderPressure/demo.py
- run demo.py or scripts/generate.py
# generate from a single prompta
# e.g. generate a 4-second wave motion . Unit of `--motion_length` is seconds.
python -m scripts.generate --text_prompt "a person waves with his right hand." --motion_length 4 --footskate_cleanup
# Generate from your text file
# e.g. generate 5 motions by different prompts in .txt file, and set the motion frame length separately by .txt file. Unit of `--input_len` is frame.
python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --input_lens ./asserts/motion_lens.txt
# e.g. generate 5 motions by different prompts in .txt file with the same motion length.
python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --motion_length 4
# Generate from test set prompts
# e.g. Randomly selecting 10 prompts in test set to generate motions
python -m scripts.generate --num_samples 10
You may also define :
--device
id.--diffuser_name
sampler type in diffuser (e.g. 'ddpm','ddim','dpmsolver'), related settings see ./config/diffuser_params.yaml--num_inference_steps
number of iterative denoising steps during inference--seed
to sample different prompts.--motion_length
in seconds--opt_path
for loading model--footskate_cleanup
to use footskate module in the diffusion framework
You will get :
output_dir/joints_npy/xx.npy
- xyz pose sequence of the generated motionoutput_dir/xx.mp4
- visual animation for generated motion.
outputdir is located in the ckeckpoint dir like checkpoints/t2m/t2m_condunet1d_batch64/samples_t2m_condunet1d_batch64_50173_seed0_a_person_waves_with_his_right_hand/
.
The visual animation will look something like this:
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./data/HumanML3D
KIT - Download from HumanML3D (no processing needed this time) and the place result in ./data/KIT-ML
We use the same evaluation protocol as this repo. You should download pretrained weights of the contrastive models in t2m and kit for calculating FID and precisions. To dynamically estimate the length of the target motion, length_est_bigru
and Glove data are required.
Unzipped all files and arrange them in the following file structure:
StableMoFusion
└── data
├── glove
│ ├── our_vab_data.npy
│ ├── our_vab_idx.pkl
│ └── out_vab_words.pkl
├── pretrained_models
│ ├── kit
│ │ └── text_mot_match
│ │ └── model
│ │ └── finest.tar
│ └── t2m
│ │ ├── text_mot_match
│ │ │ └── model
│ │ │ └── finest.tar
│ │ └── length_est_bigru
│ │ └── model
│ │ └── finest.tar
├── HumanML3D
│ ├── new_joint_vecs
│ │ └── ...
│ ├── new_joints
│ │ └── ...
│ ├── texts
│ │ └── ...
│ ├── Mean.npy
│ ├── Std.npy
│ ├── test.txt
│ ├── train_val.txt
│ ├── train.txt
│ └── val.txt
├── KIT-ML
│ ├── new_joint_vecs
│ │ └── ...
│ ├── new_joints
│ │ └── ...
│ ├── texts
│ │ └── ...
│ ├── Mean.npy
│ ├── Std.npy
│ ├── test.txt
│ ├── train_val.txt
│ ├── train.txt
│ └── val.txt
|── kit_mean.npy
|── kit_std.npy
|── t2m_mean.npy
|── t2m_std.npy
HumanML3D
accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name t2m_condunet1d --model-ema --dataset_name t2m
KIT-ML
accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name kit_condunet1d --model-ema --dataset_name kit
You may also define the --config_file
for training on multi gpus.
HumanML3D
```shell python -m scripts.evaluation --opt_path ./checkpoints/t2m/t2m_condunet1d_batch64/opt.txt ``` The evaluation results will be saved in `./checkpoints/t2m/t2m_condunet1d_batch64/eval`KIT-ML
```shell python -m scripts.evaluation --opt_path ./checkpoints/kit/kit_condunet1d_batch64/opt.txt ```The evaluation results will be saved in ./checkpoints/kit/kit_condunet1d_batch64/eval
Download smplh to folder ./data/smplh
and run train_UnderPressure_model.py
python -m scripts.train_UnderPressure_model --dataset_name t2m
If you want to see the generated motions in mesh, as shown in the video below, we recommend you follow MLD to render mesh from our .npy
file.
This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
text-to-motion, MDM, MotionDiffuse, GMD, MLD and UnderPressure.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, Diffusers, SMPL-X, PyTorch3D, ... and uses datasets that each have their own respective licenses that must also be followed.