Train Learned Planners

This repository contains training code for the paper "Planning behavior in a recurrent neural network that plays Sokoban", from the ICML 2024 Mechanistic Interpretability Workshop. (OpenReview) (arXiv). It is based on CleanRL.

The learned-planner repository lets you download and use the trained neural networks. If you just want to do interpretability, you should go there.

🚀 Running Training

First, clone the repo with:

git clone --recurse-submodules https://github.com/AlignmentResearch/train-learned-planners
# If you have already cloned the repo:
git submodule init
git submodule update --remote

We use Docker (on Mac, Orbstack) to easily distribute dependencies. You can get a local development environment by running make docker. If you have a Kubernetes cluster, you can adapt k8s/devbox.yaml and run make devbox (or make cuda-devbox).

⚙️ Training Commands

The training code expects the Boxoban levels in /opt/sokoban_cache/boxoban-levels-master, but it is possible to change that path. You can download them using:

BOXOBAN_CACHE="/opt/sokoban_cache/"  # change if desired
mkdir -p "$BOXOBAN_CACHE"
git clone https://github.com/google-deepmind/boxoban-levels \
  "$BOXOBAN_CACHE/boxoban-levels-master"

The launcher scripts for the final runs are numbered 061_pfinal2 and above.

Training the ConvLSTM (DRC)

For DRC(3, 3):

python -m cleanba.cleanba_impala --from-py-fn=cleanba.config:sokoban_drc33_59 \
  "train_env.cache_path=$BOXOBAN_CACHE" \
   "eval_envs.valid_medium.cache_path=$BOXOBAN_CACHE"

For DRC(D, N) (e.g. DRC(1, 1)):

D=1
N=1
python -m cleanba.cleanba_impala --from-py-fn=cleanba.config:sokoban_drc33_59 \
  "train_env.cache_path=$BOXOBAN_CACHE" \
  "eval_envs.valid_medium.cache_path=$BOXOBAN_CACHE" \
  net.n_recurrent=$D net.repeats_per_step=$N

Training the ResNet

python -m cleanba.cleanba_impala --from-py-fn=cleanba.config:sokoban_resnet_59 \
  "train_env.cache_path=$BOXOBAN_CACHE" \
  "eval_envs.valid_medium.cache_path=$BOXOBAN_CACHE"

📦 Local Install (May Fail)

From inside your Python 3.10 local environment, run:

make local-install

If you're not on Linux, Python 3.10, and x86_64, you will get the following error:

ERROR: envpool-0.8.4-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.

You can still use non-envpool environments by using BoxobanConfig and SokobanConfig (in cleanba/environments.py).

🛠️ Development

Experiment lists: All the experiments we ran to debug and tune hyperparameters are under experiments/. Each experiment launches jobs in a Kubernetes cluster.
Tests: Run make mactest to run all the tests expected to succeed on a local machine.
Linting: Run make lint format typecheck to lint, format, and typecheck the code.

📑 Citation

If you use this code, please cite our work:

@inproceedings{garriga-alonso2024planning,
    title={Planning behavior in a recurrent neural network that plays Sokoban},
    author={Adri{\`a} Garriga-Alonso and Mohammad Taufeeque and Adam Gleave},
    booktitle={ICML 2024 Workshop on Mechanistic Interpretability},
    year={2024},
    url={https://openreview.net/forum?id=T9sB3S2hok}
}

Name		Name	Last commit message	Last commit date
Latest commit History 511 Commits
.circleci		.circleci
cleanba		cleanba
experiments		experiments
k8s		k8s
tests		tests
third_party		third_party
.dir-locals.el		.dir-locals.el
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train Learned Planners

🚀 Running Training

⚙️ Training Commands

Training the ConvLSTM (DRC)

Training the ResNet

📦 Local Install (May Fail)

🛠️ Development

📑 Citation

About

Releases

Packages

Contributors 2

Languages

License

AlignmentResearch/train-learned-planner

Folders and files

Latest commit

History

Repository files navigation

Train Learned Planners

🚀 Running Training

⚙️ Training Commands

Training the ConvLSTM (DRC)

Training the ResNet

📦 Local Install (May Fail)

🛠️ Development

📑 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages