This is the repository for "Transfer Learning between non-Markovian RL Tasks through Semantic Representations of Temporal States". Several parts of the codebase are slighly modified versions of the code from LTL2Action.
We recommend using Python 3.6 to run this code. Required libraries can be
installed with pip install -r requirements.txt
. Additionally, the MONA
tool must be installed for ltl2dfa
to work properly.
Make sure to execute all the commands from the src
directory.
Datasets can be generated from YAML config files. They can contain either a set
of reference automata, or a set of formulas and the corresponding kernel
representations. The dataset
directory contains the YAML files used to
generate the datasets used in the paper.
To generate a dataset, use the command
python dataset.py <component> DS_DIR
where DS_DIR
is the directory in which the YAML config resides. The generated
objects will be saved in that directory. <component>
can be either:
-
references
to generate reference automata -
formulas
to generate formulas and the corresponding automata -
kernel
to generate the kernel representations
To ensure correct interaction with the dataset loading functions, the dataset directories should not contain any underscores in their name.
To compuete and plot an histogram of the intra-task variation distribution, use
the script src/analysis_intra.py
as
python analysis_intra.py <SPECS...> -x quality --bins <BINS> --alpha <ALPHA>
where:
-
<SPECS...>
is any number of dataset-kernel specifications, each in the form<dataset>-<kernel>
-
<BINS>
is the number of bins for the historgram -
<ALPHA>
is the alpha value for the historgram
By default, this script also loads for each dataset also the dataset with the
same name and a prefix test
. This can be disabled with the --exclude-test
flag.
To compute and store on disk the inter-task variation between two datasets, use
the script src/analysis_inter.py
as
python analysis_intra.py <DS1> <DS2> <KERNEL> <OUT_PATH>
where:
-
<DS1>
and<DS2>
are the names of dataset directories (assumed to be contained insrc/datasets/
) -
<KERNEL>
is the name of the kernel representation to use for both datasets -
<OUT_PATH>
is the path to the output file, which will be stored in the pickle binary format.
To use the default RL algorithm parameters and only change the state
representation used, use the src/train.sh
script as
./train.sh --progression-mode <mode> --ltl-sampler "Dataset_<dataset>_<kernel>_curriculum" --seed <SEED>
Where:
-
<mode>
is eitherkernel
ordfa_naive
(vanilla reward machines) -
<dataset>
is the name of a dataset (i.e. directory inside ofsrc/datasets
) -
<kernel>
is the name of a kernel representation for the dataset (ignored for vanilla reward machines)
To override parameters of the RL algo, use directly src/train_agent.py
. Available parameters are documented in the script itself. By default, results, logs and checkpoints are saved in a directory
inside src/storage
named automatically from the parameters given on the
command line. This will contain a txt and a csv log, the best and last model
states, the tensorboard log files, and the pickled configuration dictionary.
Evaluating the best model saved during training can be done with the
src/eval.py
script:
python eval.py <RUN_DIR>
where <RUN_DIR>
is the directory generated by the training script above.