Implementation of the APV in TensorFlow 2. Our code is based on the implementation of DreamerV2. We also provide the raw data we used for reporting our main experimental results in data
directory.
If you find this code useful, please reference in our paper:
@article{seo2022reinforce,
title={Reinforcement Learning with Action-Free Pre-Training from Videos},
author={Seo, Younggyo and Lee, Kimin and James, Stephen and Abbeel, Pieter},
journal={arXiv preprint arXiv:2203.xxxxx},
year={2022}
}
Please also cite DreamerV2 paper on top of which APV is built:
@article{hafner2020dreamerv2,
title={Mastering Atari with Discrete World Models},
author={Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy},
journal={arXiv preprint arXiv:2010.02193},
year={2020}
}
APV is a pre-training and fine-tuning framework for model-based RL algorithms: we first pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments.
Get resources you need in below links:
- Pre-trained models
- Datasets used for pre-training (download this if you want to run pre-training by yourself)
Get dependencies:
pip install tensorflow==2.6.0 tensorflow_text==2.6.0 tensorflow_estimator==2.6.0 tensorflow_probability==0.14.1 ruamel.yaml 'gym[atari]' dm_control git+https://github.com/rlworkgroup/metaworld.git@a0009ed9a208ff9864a5c1368c04c273bb20dd06#egg=metaworld
Run video pre-training (if you want to run pre-training by yourself):
python apv_pretraining/train_video.py --logdir {save path} --load_logdir {dataset path consisting of npzs} --configs metaworld_pretrain
Run Meta-world experiments:
python apv_finetuning/train.py --logdir {save path} --load_logdir {path to the pre-trained models} --configs metaworld --task metaworld_lever_pull
Run DeepMind Control Suite experiments:
python apv_finetuning/train.py --logdir {save path} --load_logdir {path to the pre-trained models} --configs dmc_vision --task dmc_quadruped_walk
-
Use
TF_XLA_FLAGS=--tf_xla_auto_jit=2
to accelerate the training. This requires properly setting your CUDA and CUDNN paths in our machine. You can check this whetherwhich ptxas
gives you a path to the CUDA/bin path in your machine. -
Also see the tips available in DreamerV2 repository.