This repository provides FamO2O's implementation over conservative Q-learning (CQL). The results of FamO2O+CQL on D4RL Locomotion can be reproduced by these codes.
- Install and use the included Anaconda environment
$ conda env create -f environment.yml
$ source activate JaxCQL
You'll need to get your own MuJoCo key if you want to use MuJoCo.
- Add this repo directory to your
PYTHONPATH
environment variable.
export PYTHONPATH="$PYTHONPATH:$(pwd)"
- If you do not want sync with wandb, please use the following command:
python -m cql_finetune.family_conservative_sac_finetune_main --env ${ENV} --seed ${SEED} -c family --wandb-offline --wandb-output-dir "./experiment_output"
Here, ${SEED}
refers to the random seed. In our paper, to avoid cherry-picking random seeds, we use 0, 1, 2, 3, 4, 5
for 6 repeated experiments.
${ENV}
can be one of the following environments:
halfcheetah-medium-v2
halfcheetah-medium-replay-v2
halfcheetah-medium-expert-v2
walker2d-medium-v2
walker2d-medium-replay-v2
walker2d-medium-expert-v2
hopper-medium-v2
hopper-medium-replay-v2
hopper-medium-expert-v2
- If you need to sync with wandb, please use the following command:
python -m cql_finetune.family_conservative_sac_finetune_main --env ${ENV} --seed ${SEED} -c family --wandb-output-dir "./experiment_output" --wandb-project ${WANDB_PROJECT} --wandb-entity ${WANDB_ENTITY}
Here, the ${WANDB_PROJECT}
and ${WANDB_ENTITY}
should be substituted by your own wandb project and entity, respectively.
The implementation is based on CQL.