Skip to content

Latest commit

 

History

History
134 lines (99 loc) · 3.59 KB

EchoMimic.md

File metadata and controls

134 lines (99 loc) · 3.59 KB

EchoMimic

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

The orignal project never mentioned support AMD ROCm GPU. Actually it is depends on PyTorch and ROCm is already one part of PyTorch. That means you could run it with AMD ROCm GPU(MI series and Radeon Series GPU).

Here I show the steps for running it with ROCm

Installation

Download the Codes

  git clone https://github.com/BadToBest/EchoMimic
  cd EchoMimic

Python Environment Setup

  • Tested System Environment: Ubuntu 22.04, ROCm >= 6.0
  • Tested GPUs: Radeon Pro W7900 / MI300X
  • Tested Python Version: 3.10
  conda create -n echomimic python=3.10
  conda activate echomimic

Comments the top three lines of requirements.txt and save. (do not install torch cuda version)

#torch>=2.0.1,<=2.2.2
#torchvision>=0.15.2,<=0.17.2
#torchaudio>=2.0.2,<=2.2.2

Install PyTorch ROCm version

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1

Install packages with pip

  pip install -r requirements.txt

**Then do it as same as cuda what the original repo README.md say **

Download ffmpeg-static

Download and decompress ffmpeg-static, then

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Download pretrained weights

git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

The pretrained_weights is organized as follows.

./pretrained_weights/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│   └── ...
├── sd-image-variations-diffusers
│   └── ...
└── audio_processor
    └── whisper_tiny.pt

In which denoising_unet.pth / reference_unet.pth / motion_module.pth / face_locator.pth are the main checkpoints of EchoMimic. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:

Audio-Drived Algo Inference

Run the python inference script:

  python -u infer_audio2vid.py
  python -u infer_audio2vid_pose.py

Audio-Drived Algo Inference On Your Own Cases

Edit the inference config file ./configs/prompts/animation.yaml, and add your own case:

test_cases:
  "path/to/your/image":
    - "path/to/your/audio"

The run the python inference script:

  python -u infer_audio2vid.py

Motion Alignment between Ref. Img. and Driven Vid.

(Firstly download the checkpoints with '_pose.pth' postfix from huggingface)

Edit driver_video and ref_image to your path in demo_motion_sync.py, then run

  python -u demo_motion_sync.py

Audio&Pose-Drived Algo Inference

Edit ./configs/prompts/animation_pose.yaml, then run

  python -u infer_audio2vid_pose.py

Pose-Drived Algo Inference

Set draw_mouse=True in line 135 of infer_audio2vid_pose.py. Edit ./configs/prompts/animation_pose.yaml, then run

  python -u infer_audio2vid_pose.py

Run the Gradio UI

Thanks to the contribution from @Robin021:

python -u webgui.py --server_port=3000