Skip to content

Latest commit

 

History

History
104 lines (63 loc) · 3.75 KB

README.md

File metadata and controls

104 lines (63 loc) · 3.75 KB

Semantic Kernel Representation for Transfer Learning between Reward Machines

This is the repository for "Transfer Learning between non-Markovian RL Tasks through Semantic Representations of Temporal States". Several parts of the codebase are slighly modified versions of the code from LTL2Action.

Installing Dependencies

We recommend using Python 3.6 to run this code. Required libraries can be installed with pip install -r requirements.txt. Additionally, the MONA tool must be installed for ltl2dfa to work properly.

Usage

Make sure to execute all the commands from the src directory.

Generating Datasets

Datasets can be generated from YAML config files. They can contain either a set of reference automata, or a set of formulas and the corresponding kernel representations. The dataset directory contains the YAML files used to generate the datasets used in the paper.

To generate a dataset, use the command

python dataset.py <component> DS_DIR

where DS_DIR is the directory in which the YAML config resides. The generated objects will be saved in that directory. <component> can be either:

  • references to generate reference automata

  • formulas to generate formulas and the corresponding automata

  • kernel to generate the kernel representations

To ensure correct interaction with the dataset loading functions, the dataset directories should not contain any underscores in their name.

Analyzing intra-task variation

To compuete and plot an histogram of the intra-task variation distribution, use the script src/analysis_intra.py as

python analysis_intra.py <SPECS...> -x quality --bins <BINS> --alpha <ALPHA>

where:

  • <SPECS...> is any number of dataset-kernel specifications, each in the form <dataset>-<kernel>

  • <BINS> is the number of bins for the historgram

  • <ALPHA> is the alpha value for the historgram

By default, this script also loads for each dataset also the dataset with the same name and a prefix test. This can be disabled with the --exclude-test flag.

Computing and storing inter-task variation

To compute and store on disk the inter-task variation between two datasets, use the script src/analysis_inter.py as

python analysis_intra.py <DS1> <DS2> <KERNEL> <OUT_PATH>

where:

  • <DS1> and <DS2> are the names of dataset directories (assumed to be contained in src/datasets/)

  • <KERNEL> is the name of the kernel representation to use for both datasets

  • <OUT_PATH> is the path to the output file, which will be stored in the pickle binary format.

Training Agents

To use the default RL algorithm parameters and only change the state representation used, use the src/train.sh script as

./train.sh --progression-mode <mode> --ltl-sampler "Dataset_<dataset>_<kernel>_curriculum" --seed <SEED>

Where:

  • <mode> is either kernel or dfa_naive (vanilla reward machines)

  • <dataset> is the name of a dataset (i.e. directory inside of src/datasets)

  • <kernel> is the name of a kernel representation for the dataset (ignored for vanilla reward machines)

To override parameters of the RL algo, use directly src/train_agent.py. Available parameters are documented in the script itself. By default, results, logs and checkpoints are saved in a directory inside src/storage named automatically from the parameters given on the command line. This will contain a txt and a csv log, the best and last model states, the tensorboard log files, and the pickled configuration dictionary.

Evaluating trained agents

Evaluating the best model saved during training can be done with the src/eval.py script:

python eval.py <RUN_DIR>

where <RUN_DIR> is the directory generated by the training script above.