This repository contains the code for experiments conducted for my master thesis realized at Amsterdam Machine Learning Lab and TenneT.
Please see the paper Hierarchical Reinforcement Learning for Power Network Topology Control for more details. If this code is useful please cite our paper:
@misc{manczak2023hierarchical,
title={Hierarchical Reinforcement Learning for Power Network Topology Control},
author={Blazej Manczak and Jan Viebahn and Herke van Hoof},
year={2023},
eprint={2311.02129},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
The env.yml
file can be used to install the dependencies.
We run our experiments with the rte_case14_realistic
modeled in Grid2Op package. The 1000 chronics (=episodes) are divided into a train, validation, and test set. The chronic numbers that we use for each split are saved in NumPy arrays in the grid2op_env/train_val_test_split
folder.
The following has to be run only once to download and set up the environment:
import grid2op
env_name = "rte_case14_realistic"
env = grid2op.make(env_name)
val_chron, test_chron = np.load("grid2op_env/train_val_test_split/val_chronics.npy"), \
np.load("/grid2op_env/train_val_test_split/test_chronics.npy")
nm_env_train, m_env_val, nm_env_test = env.train_val_split(
test_scen_id=test_chron,
add_for_test="test",
val_scen_id=val_chron,
)
env_train = grid2op.make(env_name+"_train")
env_val = grid2op.make(env_name+"_val")
env_test = grid2op.make(env_name+"_test")
Note that wandb is used for monitoring the progress of the experiment.
If you wish to use wandb make sure to specify the WANDB_API_KEY
in the .env
file. Alternatively, comment out WandbLoggerCallback
in the train.py
file.
We train and benchmark the models in an environment with and without outages. The environment setting is controlled by the boolean --with_opponennt
keyword argument in the train.py
script.
By default, the 5 best checkpoints in terms of mean episode reward will be saved in the log_files
directory.
These agents support training with PPO and SAC algorithms.
To train these agents, go to the main branch and run the train.py
file with desired keyword arguments. The choice of hyperparameters in a .yaml
file. Specifications used in the paper are found in the experiments
folder under the corresponding algorithm name.
For instance, to train a hybrid PPO agent in the setting with outages for 1000 iterations and over 10 different seeds run:
python train.py --algorithm ppo \
--algorithm_config_path experiments/ppo/ppo_run_threshold.yaml \
--with_opponent True \
--num_iters 1000 \
--num_samples 10 \
See the argparse
help for more details on keyword arguments.
To train a fully hierarchical agent go to the hierarchical_approach
branch and run the train_hierarchical.py
file with desired keyword arguments. Similar to the native and hybrid agents, the choice of hyperparameters in a .yaml
file.
To train the hierarchical agent in the setting with outages for 1000 iterations and over 10 different seeds run:
python train_hierarchical.py --algorithm ppo \
--algorithm_config_path experiments/hierarchical/full_mlp_share_critic.yaml \
--use_tune True \
--num_iters 1000 \
--num_samples 16 \
--checkpoint_freq 10 \
--with_opponent True
To run the trained agent on the set of test chronics run:
python evaluation/run_eval.py --agent_type X \
--checkpoint_path Y \
--checkpoint_num Z \
--use_split test \
If the agent being evaluated is a fully hierarchical (i.e. non-hybrid) add keyword argument --hierarchical True
.
Except for printing the mean episode length, this script involves data collection that is needed for further analysis. The data is saved in a folder evaluation/eval_results
and can be used for further analysis.
The functionality for further analysis is implemented in the evaluation/results_analysis.py
file. Given the path to evaluation results it is easy to obtain a table with the statistics:
from evaluation.result_analysis import process_eval_data_multiple_agents, \
get_analysis_objects, \
compile_table_df
EVAL_PATHS = {"Agent Type 1": (path_to_eval_results, "wandb_num"),
"Agent Type 2": (path_to_eval_results, "wandb_num"), ...}
data_per_algorithm = process_eval_data_multiple_agents(EVAL_PATHS)
# Compile the data frame from which we will later plot the results
df = compile_table_df(data_per_algorithm)
evaluation
contains the code for benchmarking trained agents
experiments
contains the specification of the model hyperparameters and custom callbacks
grid2op_env
contains the environment wrappers, train/test/val split, and data used to scale the observations
models
contains the code for the torch models used in the experiments
notebooks
contains miscellaneous notebooks used in the course of development and evaluation. Notably sub_node_model.ipnyb
contains an alpha version of a Graph Neural Network (GNN) based policy.