Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data.py		data.py
demo.ipynb		demo.ipynb
main.py		main.py
truncationPSGD.py		truncationPSGD.py
utils.py		utils.py

Repository files navigation

Learning High-dimensional Gaussians from Censored Data

This is an implementation of the following paper:

Arnab Bhattacharyya, Constantinos Daskalakis, Themis Gouleakis, Thanh Vinh Vo, Wang Yuhao

"Learning High-dimensional Gaussians from Censored Data" arXiv preprint arXiv (2022).

Background

The missingness mechanism are as follows:

Missing Completely At Random, value is missing with some probability \alpha;
Missing At Random. One fully observed variable lead to the missingness of another variable.
Missing Not At Random. Hidden variable(s) lead to the missingness of a fully observed variable.

MCAR	MAR

Self-masking MNAR	General MNAR

Introduction

Assume the censoring model is MNAR, we study two settings

[Self-censoring]: Assume self-censoring mechanism, we developed a distribution learning algorithm (Algorithm 1 below) tha learns $ N(\mu^, \Sigma^)$ up to TV distance $\varepsilon$.
[Convex masking]: When the missingness mechanisms are in general, we design an efficient mean estimation algorithm from a d-dimensional Gaussian $N{\mu^*, \Sigma}$, assuming that the observed missingness pattern is not very rare conditioned on the values of the observed coordinates, and that any small subset of coordinates is observed with sufficiently high probability.

Related work

Prerequisites

Python 3.6+
- seaborn
- argpase
- numpy
- pandas
- scipy
- sklearn
- matplotlib
- torch
- cvxpylayers
- tqdm

data.py - generate synthetic data. Load real data.
config.py - simulation parameters.
utils.py - difference missingness mechanism, such as self-censoring MNAR, MNAR missingness in general, MAR, MCAR.
truncationPSGD - the implementation of the algorithm 1 in our paper.
main.py - main algorihtm.
demo.ipynb- demo of our implementation

Parameters

Parameter	Type	Description	Options
`n`	int	number of samples	-
`d`	int	number of variables	-
`plot`	Bool	plot chain graph or not	-
`algorithm`	str	choice which algorithm	`self-censoring`, `convex-masking`

Running a simple demo

The simplest way to try out MissingDescent is to run a simple example:

$ git clone https://github.com/YohannaWANG/MissingDescent.git
$ cd MissingDescent/
$ python $ cd MissingDescent/main.py

Runing as a command

Alternatively, if you have a CSV data file X.csv, you can install the package and run the algorithm as a command:

$ pip install git+git://github.com/YohannaWANG/MissingDescent
$ cd MissingDescent
$ python main.py --algorithm self-censoring --d 50 --n 1000

Algorithms

Algorithm 1 [Truncation_PSGD] Distribution recovery given access to an oracle that generates samples with incomplete data;
Algorithm 2 [MissingDescent] Mean recovery given access to an oracle that generates samples with incomplete data.
Algorithm 3 [Initialize] Initialization for the main algorithm.
Algorithm 4 [SampleGradient] Sampler for $\nabla \ell(\bm{\mu})$.
Algorithm 5 [ProjectToDomain] The function that projects a current guess back to the domain onto the $\ball_{\bm{\Sigma}}$ ball.

Performance

[Truncation_PSGD] (Mean absolute percentage error (MAPE) and KL divergence)

[Truncation_PSGD] We fixed N=20,000 and varied the percentage of missing from 10% to 80%.

:--------------------------------------------------------------------:

[Truncation_PSGD] Running time on synthetic data.

[Truncation_PSGD] Semi-synthetic dataset.

Related Works

One paragraph in our related work section gives almost a complete history of work done on them! We summarized most of the related works below, it will also be updated accordingly.

https://github.com/YohannaWANG/Missing-Data-Literature

Citation

Contacts

Please feel free to contact us if you meet any problem when using this code. We are glad to hear other advise and update our work. We are also open to collaboration if you think that you are working on a problem that we might be interested in it. Please do not hestitate to contact us!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning High-dimensional Gaussians from Censored Data

Background

Introduction

Related work

Prerequisites

Contents

Parameters

Running a simple demo

Runing as a command

Algorithms

Performance

Related Works

Citation

Contacts

About

Releases

Packages

Languages

License

YohannaWANG/MissingDescent

Folders and files

Latest commit

History

Repository files navigation

Learning High-dimensional Gaussians from Censored Data

Background

Introduction

Related work

Prerequisites

Contents

Parameters

Running a simple demo

Runing as a command

Algorithms

Performance

Related Works

Citation

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages