Advantage-Conditioned Transformer

This is the official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning, which is accepted by AAAI 2024.

Dependencies

offlinerllib==0.1.4
UtilsRL==0.6.5
gym==0.23.1
mujoco-py==2.1.2.14
torch==2.0.1
D4RL==1.1

How to Reproduce the results

Below are the commands for reproducing the results. Feel free to contact me if anything goes wrong in your local dev environment.

For D4RL tasks
```
python3 reproduce/sequence_rvs/run_sequence_rvs_onestep.py \
    --config reproduce/sequence_rvs/config/onestep/mujoco/${env_name}-v2.py \
    --iql_tau ${iql_tau}
```
Here the value for iql tau can be found in the Appendix. In the actual benchmarking we incorporated model selection for the critics. If you want to do that, you can use reproduce/sequence_rvs/run_iql_pretrain.py t first pre-train the critics, select the best fitted one, and add --load_path to load the selected critics.

For the 2048 game

python3 reproduce/sequence_rvs/run_sequence_rvs_stoc.py \
    --config reproduce/sequence_rvs/config/onestep/stoc_toy/2048-v0.py

For the delayed rewards tasks

python3 reproduce/sequence_rvs/run_sequence_rvs_onestep.py \
    --config reproduce/sequence_rvs/config/onestep/delayed/base.py \
    --task walker2d-medium-expert-v2

For the stochastic mujoco tasks

python3 reproduce/sequence_rvs/run_sequence_rvs_onestep.py \
    --config reproduce/sequence_rvs/config/onestep/stochastic_mujoco/

Citation

@inproceedings{act,
  author = {Chen-Xiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu}, 
  title = {ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning},
  booktitle = {Proceedings of the Thirty-Eighth {AAAI} Conference on Artificial Intelligence},
  year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
acsl		acsl
reproduce		reproduce
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advantage-Conditioned Transformer

Dependencies

How to Reproduce the results

Citation

About

Releases

Packages

Languages

LAMDA-RL/ACT

Folders and files

Latest commit

History

Repository files navigation

Advantage-Conditioned Transformer

Dependencies

How to Reproduce the results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages