Proof of concept EM-Algorithm implementation that uses prior knowledge of probabilities on 2D points to train a multivariate Gaussian Mixture Model (GMM).
Basically the probability is used for normalization during the maximization step. When the sampling count is low, the square root of the probability p can be used instead of p as an optimization.
The expectation step is not changed:
- KMeans with random point initialization
- Low Max-Iterations (default: 5)
- Low count of training points (default: 20).
- Comparison and reference of EM implementation in OpenCV
-
Run:
./main.py
-
Store:
./main.py --save test.json
-
Replay:
./main.py --load test.json
-
Run large test:
./compare.py
More information is used to approximate the incomplete data. It shows slightly better results than the reference algorithm, especially in a sparse sampled environment.
But keep in mind that with a low iteration count the initial guess via K-Means plays a big role.
Initial
is the desired distribution that was used to sample the red dots.
OpenCV-EM
is the reference algorithm by OpenCV.
Weighted-EM
is the enhancement by using the probabilities in normalization.
On low sampling rate some normals can become faint. The square-root on the probability can bring them to the front again.