Skip to content

Official codebase for "Privileged Sensing Scaffolds Reinforcement Learning", contains the Scaffolder algorithm and Sensory Scaffolding Suite.

License

Notifications You must be signed in to change notification settings

penn-pal-lab/scaffolder

Repository files navigation

🏗️ Privileged Sensing Scaffolds Reinforcement Learning

ICLR 2024 Spotlight
Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

Welcome to the codebase for "Privileged Sensing Scaffolds Reinforcement Learning". Here, you will find:

  • Scaffolder, a model-based RL method that uses privileged observations to better train policies.
  • Sensory Scaffolding Suite (S3), a privileged POMDP benchmark of 10 diverse tasks containing locomotion, dexterous manipulation, piano playing, and more.

Concept

If you find our paper or code useful, please reference us:

@inproceedings{
  hu2024privileged,
  title={Privileged Sensing Scaffolds Reinforcement Learning},
  author={Edward S. Hu and James Springer and Oleh Rybkin and Dinesh Jayaraman},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=EpVe8jAjdx}
}

To learn more about Scaffolder:

Quickstart

To use Scaffolder, we need to install the Scaffolder algorithm and any environments we would like to run. Scaffolder builds on top of the DreamerV3 codebase, and expects the environments to follow the Gymnasium API. We expect the environment to return observation dictionaries, and specify through Scaffolder's configuration which observation keys are privileged or not.

After installating Scaffolder and environments, the folder structure should look like this:

projects/                   # Your project folder
  |- scaffolder/            # code for Scaffolder algorithm
  |- gymnasium_robotics/    # code for 7/10 S3 tasks
  |- gymnasium/             # code for Blind Locomotion task
  |- robopianist/           # code for Blind Deaf Piano task
  |- brachiation/           # code for Noisy Monkey task

1. Create Conda environment.

Scaffolder runs on both Ubuntu and MacOS. I recommend installing it on MacOS for fast local development, and Ubuntu for actual GPU training.

conda create -n scaffolder python=3.8

Python 3.9 or later should also work, just that on my laptop it's 3.8.

2. Install Scaffolder algorithm

  1. First, install jax following their instructions.

  2. Then, clone Scaffolder's codebase, which extends DreamerV3.

git clone git@github.com:penn-pal-lab/scaffolder.git
# install dependencies.
pip install -r requirements.txt 
# install scaffolder as a local python package
pip install -e .

It's likely you will run into some versioning errors during installation of the packages in requirements.txt since they are not pinned to any versions. You can just overcome these by commenting out the troublesome packages in the file and manually install them with pip install <your package>==<desired version>.

3. Install Sensory Scaffolding Suite

The Sensory Scaffolding Suite (S3) is a collection of 10 tasks with predefined privileged and target observation spaces. The tasks are implemented via different repositories, so for a given task, you must install the corresponding repo. For users looking to just get started, I recommend just installing Gymnasium Robotics since that covers 7/10 tasks in S3.

Repository Task
Gymnasium Robotics Blind Pick, Wrist Pick-Place, Occluded Pick-Place, Blind Numb Cube, Blind Numb Pen, RGB Cube, RGB Pen
Gymnasium Blind Locomotion
Robopianist Blind Deaf Piano
Brachiation Noisy Monkey
Gymnasium Robotics Installation Clone our custom fork of Gymnasium Robotics, change to the correct branch, and install dependencies.
git clone git@github.com:edwhu/Gymnasium-Robotics.git
cd gymnasium_robotics
git checkout v0.1
pip install -e . # install this library as a local python package
Gymnasium Installation (TODO)
Robopianist Installation (TODO)
Brachiation Installation (TODO)

4. Test installation

As a sanity check, let's see if the core Scaffolder training loop runs with very small networks and low dimensional state inputs.

python -u dreamerv3/train.py --logdir ~/logdir/test_blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,sanity_check

It should take a couple minutes to run. After it's done, we can check the tensorboard outputs.

cd ~/logdir/
tensorboard --logdir . 

You should see some scalars and GIFs on tensorboard. This means the training loop is working, and the installation is successful.

Running Experiments

In general, you need to specify the environment and the model size configurations. You can override the configurations at the command line level as well. Below, we provide examples for running Scaffolder on each S3 task.

Blind Pick
python -u dreamerv3/train.py --logdir ~/logdir/blindpick_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindpick,small
Blind Locomotion
python -u dreamerv3/train.py --logdir ~/logdir/blindlocomotion_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindlocomotion,small
Blind Deaf Piano
python -u dreamerv3/train.py --logdir ~/logdir/blinddeafpiano_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blinddeafpiano,large
Blind Numb Cube
python -u dreamerv3/train.py --logdir ~/logdir/blindnumbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbcube,small
Blind Numb Pen
python -u dreamerv3/train.py --logdir ~/logdir/blindnumbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_blindnumbpen,small
Noisy Monkey
python -u dreamerv3/train.py --logdir ~/logdir/noisymonkey_$(date "+%Y%m%d-%H%M%S") --configs gym_noisymonkey,small
Wrist Pick Place
python -u dreamerv3/train.py --logdir ~/logdir/wristpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_wristpickplace,small
Occluded Pick Place
python -u dreamerv3/train.py --logdir ~/logdir/occludedpickplace_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_occludedpickplace,small
RGB Cube
python -u dreamerv3/train.py --logdir ~/logdir/rgbcube_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbcube,large
RGB Pen
python -u dreamerv3/train.py --logdir ~/logdir/rgbpen_$(date "+%Y%m%d-%H%M%S") --configs gymnasium_rgbpen,large

Check out the slurm folder to see example scripts for running experiments on a cluster.

Scaffolder Code Walkthrough (TODO)

Overview

Adding a new environment

Acknowledgements

This project would not be possible without the work of others. We thank them for their contributions, and we hope others will build on Scaffolder like we have with these works.

All the various environment code we used for defining S3.

All the baseline code for the evaluations.

About

Official codebase for "Privileged Sensing Scaffolds Reinforcement Learning", contains the Scaffolder algorithm and Sensory Scaffolding Suite.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published