This repsitory contains a PyTorch implementation of an MVC environment, graph convolutional networks (using DGL) and an actor-critic algorithm. At each episode the algorithm is presented with a random Erdős-Rényi graph, with a specified number of nodes and probability of edge connection, and the neural network is trained using a simple actor-critic algorithm. This code requires installing DGL (Deep Graph Library).
I have also written a Medium article on the subject of reinforcement learning for combinatorial optimization, feel free to check it out.
Below are some comparisons of solutions created by the neural network policy (upper ones) and those created by a greedy heuristic (lower):
The yellow nodes are those chosen as part of the solution, and the darker ones were left out. During solution construction, nodes are added sequentialy until all edges are covered.