Hybrid reinforcement learning with expert state sequences

About this repository

This repository contains an implementation of the paper Hybrid Reinforcement Learning with Expert State Sequences. The implementation is built directly on top of PyTorch implementation of Advantage Actor Critic (A2C).

Dependencies

To get started with the framework, install the following dependencies:

Training and model configuration

A2C baseline (A2C agent in the paper):

python main.py --policy-coef 1 --entropy-coef 0.5 --value-loss-coef 0.2 --dual-act-coef 0 --dual-state-coef 0 --dual-sup-coef 0 --dual-emb-coef 0 --log-dir baseline_a2c

Hybrid agent combining A2C and behavior cloning from observation with the proposed action inference:

python main.py --policy-coef 1 --entropy-coef 0.5 --value-loss-coef 0.2 --dual-act-coef 2 --dual-sup-coef 2 --dual-emb-coef 0.1 --dual-rank 2 --dual-emb-dim 128 --dual-type dual --log-dir hybrid_dual

Hybrid agent combining A2C and behavior cloning from observation with the MLP-based action inference (Hybrid-MLP agent in the paper):

python main.py --policy-coef 1 --entropy-coef 0.5 --value-loss-coef 0.2 --dual-act-coef 2 --dual-sup-coef 2 --dual-emb-coef 0.1 --dual-rank 2 --dual-emb-dim 128 --dual-type mlp --log-dir hybrid_mlp

(dual-rank is used as the number of MLP layers for MLP type action inference model.)

Behavior cloning from observation with the proposed action inference (BC-Dual agent in the paper):

python main.py --policy-coef 0 --entropy-coef 0.5 --value-loss-coef 0 --dual-act-coef 2 --dual-sup-coef 2 --dual-emb-coef 0.1 --dual-rank 2 --dual-emb-dim 128 --dual-type dual --log-dir dual_only

Behavior cloning from observation with the MLP-based action inference model (BC-MLP agent in the paper):

python main.py --policy-coef 0 --entropy-coef 0.5 --value-loss-coef 0 --dual-act-coef 2 --dual-sup-coef 2 --dual-emb-coef 0.1 --dual-rank 2 --dual-emb-dim 128 --dual-type mlp --log-dir mlp_only

The noise in the expert demonstration can be controlled using arguments --demo-eps (the non-optimal action ratio) and --demo-eta (the missing state ratio).

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
algo		algo
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
distributions.py		distributions.py
dual_model.py		dual_model.py
enjoy.py		enjoy.py
envs.py		envs.py
evaluate.py		evaluate.py
main.py		main.py
model.py		model.py
policies.py		policies.py
storage.py		storage.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid reinforcement learning with expert state sequences

About this repository

Dependencies

Training and model configuration

License

About

Releases

Packages

Languages

License

XiaoxiaoGuo/tensor4rl

Folders and files

Latest commit

History

Repository files navigation

Hybrid reinforcement learning with expert state sequences

About this repository

Dependencies

Training and model configuration

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages