Skip to content

Latest commit

 

History

History
26 lines (15 loc) · 2.44 KB

finetuning.md

File metadata and controls

26 lines (15 loc) · 2.44 KB

Fine-tuning experiments

Comparing diffusion-based RL algorithms (Sec. 5.1)

Gym configs are under cfg/gym/finetune/<env_name>/, and the naming follows ft_<alg_name>_diffusion_mlp, e.g., ft_awr_diffusion_mlp. alg_name is one of rwr, awr, dipo, idql, dql, qsm, ppo (DPPO), ppo_exact (exact likelihood). They share the same pre-trained checkpoint in each env.

Robomimic configs are under cfg/robomimic/finetune/<env_name>/, and the naming follows the same.

Comparing policy parameterizations (Sec. 5.2, 5.3)

Robomimic configs are under cfg/robomimic/finetune/<env_name>/, and the naming follows ft_ppo_<diffusion/gaussian/gmm>_<mlp/unet/transformer>_<img?>. For pixel experiments, we choose pre-trained checkpoints such that the pre-training performance is similar between DPPO and Gaussian baseline.

Note: For Can and Lift in Robomimic with DPPO, you need to manually download the final checkpoints (epoch 8000). The default ones in the configs are from epoch 5000 (more room for fine-tuning improvement) and used for comparing diffusion-based RL algorithms,

Furniture-Bench configs are under cfg/furniture/finetune/<env_name>/, and the naming follows ft_<diffusion/gaussian>_<mlp/unet>. In the paper we did not show the results of ft_diffusion_mlp. Running IsaacGym for the first time may take a while for setting up the meshes. If you encounter the error about libpython, see instruction here.

D3IL (Sec. 6)

D3IL configs are under cfg/d3il/finetune/avoid_<mode>/, and the naming follows ft_ppo_<diffusion/gaussian/gmm>_mlp. The number of fine-tuned denoising steps can be specified with ft_denoising_steps.

Training from scratch (App. B.2)

ppo_diffusion_mlp and ppo_gaussian_mlp under cfg/gym/finetune/<env_name> are for training DPPO or Gaussian policy from scratch.

Comparing to exact likelihood policy gradient (App. B.5)

ft_ppo_exact_diffusion_mlp under cfg/gym/finetune/hopper-v2, cfg/gym/finetune/halfcheetah-v2, and cfg/robomimic/finetune/can are for training diffusion policy gradient with exact likelihood. torchdiffeq package needs to be installed first with pip install -e .[exact].