Skip to content

Using Natural Language for Reward Shaping in Reinforcement Learning

License

Notifications You must be signed in to change notification settings

prasoongoyal/rl-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rl-learn

This is the code for our IJCAI 2019 paper Using Natural Language for Reward Shaping in Reinforcement Learning.

Running the code:

  1. Clone this repository and install dependencies using the included requirements.txt file. The code requires Python 3.
  2. Download preprocessed data:
wget http://www.cs.utexas.edu/~ml/pgoyal/ijcai19/train_lang_data.pkl -O ./data/train_lang_data.pkl
wget http://www.cs.utexas.edu/~ml/pgoyal/ijcai19/test_lang_data.pkl -O ./data/test_lang_data.pkl
  1. Run the LEARN module training and RL training using the following commands:
mkdir learn_model
python learn/train.py --lang_enc=onehot --save_path=./learn_model
python rl/main.py --expt_id=<expt_id> --descr_id=<descr_id> --lang_coeff=1.0 --lang_enc=onehot --model_dir=./learn_model

Data

Raw data can be downloaded from http://www.cs.utexas.edu/~ml/pgoyal/ijcai19/atari-lang.zip. The directories contain frames from Montezuma's revenge (downloaded from Atari Grand Challenge dataset). The file annotations.txt contains pairs of clip ids and natural language descriptions. The clip id is formatted as <directory_name>/<start_frame>-<end_frame>.mp4

Preprocessed data can be generated from the raw data as follows:

  1. Download the InferSent model using the following command:
wget http://www.cs.utexas.edu/~ml/pgoyal/ijcai19/infersent1.pkl -O ./lang_enc_pretrained/InferSent/encoder/infersent1.pkl
  1. Download pretrained GloVe vectors (glove.6B.zip) from https://nlp.stanford.edu/projects/glove/. Put the unzipped files in lang_enc_pretrained/glove.

  2. Run the preprocessing code as follows:

python scripts/preprocess_data.py

This will create files train_lang_data.pkl and test_lang_data.pkl in the ./data directory.

Acknowledgements:

The RL code is adapted from the following implementation -- https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

About

Using Natural Language for Reward Shaping in Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published