This repository implements some algorithms about Determinantal Point Processes for Graph Sampling, from the paper Graph sampling with determinantal processes, N. Tremblay et al. (2017). The project was conducted in the framework of the course Graphs in Machine Learning of the Master's degree MVA (Mathematics, Computer Vision, Machine Learning) of École Normale Supérieure Paris-Saclay. More details can be found on the webpage of the course.
The course is taught by Michal Valko and this project was supervised by Guillaume Gautier.
The main functions are written in the main file graphSamplingWithDPP.py. Some specific tests are written in the files test_xxx.py.
The Propp-Wilson algorithm is an algorithm that enables sampling a random spanning tree of a directed graph (see How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph, J.G. Propp and D.B. Wilson (1998)). It is possible to modify the initial algorithm to sample a Determinantal Point Process over the vertices of the graph, according to a certain kernel.
The algorithm is implemented in the function wilson_algorithm.
A recreational application of the random spanning tree generation is the automated maze generation: let us interpret the walls of the maze as a covering tree branches, with the borders of the maze being the root of the tree ; then we are assured that the maze is feasible between any two points, and that there is one and only one path that do not involve stepping back that connects these two points. This application is implemented in the function generate_maze and a minimal working example and a plotting script are provided in test_generate_maze.py.
The stochastic block model is a community-structured graph model. Given N
nodes belonging to k
different communities, two nodes have a probability q1
of sharing an edge if they belong to the same community and q2
else. One shows that it is equivalent to define epsilon=q2/q1
and c
the average degree. The generation of such graphs is implemented in the function generate_graph_from_stochastic_block_model.
We implemented an algorithm for sampling from the DPP of a given kernel K
(or rather its eigendecomposition) in the function sample_from_DPP. We use wilson_algorithm to sample from the kernel K_q
for some W
and q
.
Tests for assessing the correctness of our algorithms are presented in files test_sample_from_Kk.py and test_sample_from_Kq.py.
This pipeline is implemented in test_sample_from_Kk.py. We generate a graph using the Stochastic Block Model, then generate a k
-bandlimited signal on this graph using the first k
eigenmodes. We then sample a DPP fron K_k
and use it to measure the signal on the graph. Finally, we use a direct formula (which implies the inversion of a matrix) to compute the reconstructed signal.
This pipeline is implemented in test_sample_from_Kq.py. We generate a graph using the Stochastic Block Model, then generate a k
-bandlimited signal on this graph using the first k
eigenmodes. We then sample a DPP of an adequate size from K_q
using the Propp-Wilson algorithm and use it to measure the signal on the graph. Finally, we use a direct formula (which implies the inversion of a matrix) to compute the reconstructed signal.
This pipeline is implemented in test_recovery_with_Kq_unknown_Uk.py. We generate a graph using the Stochastic Block Model, then generate a k
-bandlimited signal on this graph using the first k
eigenmodes. We then sample a DPP of an adequate size from K_q
using the Propp-Wilson algorithm and use it to measure the signal on the graph. Finally, we use a regularized formula to reconstruct the signal (i.e. a formula which do not make use of U_k
but penalized high frequencies in the reconstructed signal).
An issue is that we still need to compute the diagonal matrix P
(with coefficients P_{i, i}
being the marginal probabilities of node i
belonging to a sample). Computing P
exactly requires the knowledge of K
.