Same State, Different Task: Continual Reinforcement Learning without Interference

This code accompanies the submission for Same State Different Task: CRL without Interference. We show the dangers of using CLEAR/experience replay only as a means for preventing forgetting in CRL. This can occur when we ahve different tasks but the same or very similar states in an environment. We explore the relatively unexplained problem of interference in CRL, this is distinct from forgetting. We advocate the use of multi-heads together with a regularization of the shared feature extractor across tasks. Although using multiple heads as a means for preventing forgetting is nothing new: we crucially employ it for preventing interference and modelling the multi-modallity of the tasks. EWC and distillation are used to prevent forgetting in the shared feature extractor.

Quick Start

To run OWL on Minigrid with EWC:

python train_minigrid.py --env=SC --num_tasks=5 --num_task_repeats=3 --max_task_frames=750000 --tag=owl_t5_l500_s101 --env_seeds 111 129 112 105 155 --exp_replay_capacity=1000000 --huber --owl --q_ewc_reg=500 --seed=101 --bandits --bandit_loss=mse --bandit_debug --bandit_lr=0.88 --bandit_decay=0.9 --bandit_epsilon=0 --bandit_step=1 --log_interval=8000 --buffer_warm_start --buffer_warm_start_size=50000 --logdir=logs.

To run OWL on Minigrid with Distillation loss:

python train_minigrid.py --env=SC --num_tasks=5 --num_task_repeats=3 --max_task_frames=750000 --tag=owl_t5_f100_s101 --env_seeds 111 129 112 105 155 --exp_replay_capacity=1000000 --huber --owl --q_func_reg=100 --seed=101 --bandits --bandit_loss=mse --bandit_debug --bandit_lr=0.88 --bandit_decay=0.9 --bandit_epsilon=0 --bandit_step=1 --log_interval=8000 --buffer_warm_start --buffer_warm_start_size=50000 --logdir=logs

To run the CLEAR agent with DQN base RL algorithm:

python train_minigrid.py --env=SC --num_tasks=5 --num_task_repeats=3 --max_task_frames=750000 --log_interval=8000 --env_seeds 111 129 112 105 128 --tag=dqn_x5_s101 --exp_replay_capacity=4000000 --seed=105 --huber --logdir=logs

To run the full rehearsal baseline:

python train_minigrid.py --env=SC --num_tasks=5 --num_task_repeats=3 --max_task_frames=750000 --log_interval=8000 --env_seeds 111 129 112 105 155 --tag=fr_x5_s101 --exp_replay_capacity=750000 --seed=101 --huber --logdir=logs --multitask

Dependencies

gym==0.18.0
gym-minigrid==1.0.2

Citation

@article{kessler2021same,
  title={Same state, different task: Continual reinforcement learning without interference},
  author={Kessler, Samuel and Parker-Holder, Jack and Ball, Philip and Zohren, Stefan and Roberts, Stephen J},
  journal={arXiv preprint arXiv:2106.02940},
  year={2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
dqn.py		dqn.py
dqn_ensemble.py		dqn_ensemble.py
environment.yml		environment.yml
evaluate_agent.py		evaluate_agent.py
generalization.py		generalization.py
online_learning.py		online_learning.py
setup.sh		setup.sh
train_minigrid.py		train_minigrid.py
utils.py		utils.py
visualise_agent.py		visualise_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Same State, Different Task: Continual Reinforcement Learning without Interference

Quick Start

Dependencies

Citation

About

Releases

Packages

Languages

skezle/owl

Folders and files

Latest commit

History

Repository files navigation

Same State, Different Task: Continual Reinforcement Learning without Interference

Quick Start

Dependencies

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages