This repository will show several machine learning in privacy-preserving manner when the data is vertically partitioned. The following algorithms have been applied:
- Linear Regression
- Naive Bayes
- Secure multiparty computation
Execute the control.ipynb notebook in this repository. This is pre-configured on two data stations with complementary datasets.
Privacy_preserving_linear_regression.ipynb shows how we inmplement the linear regression in a privacy-preserving manner. You can find more details in the notebook. Furthermore, we also dockerized PPLR with using PyTaskManager (https://github.com/PersonalHealthTrain/PyTaskManager). Please have a look at the fold <PPLR_Dockerized>
In privacy-preserving machine learning and data minging in a distributed data scenario, we are facing a very challenging problem that the existing solutions are not very feasible and scalable to apply in real-world datasets. The most popular method -secure multiparty computation- takes very high costs of time, communication, and computations. To solve this problem, I suggest we can classify privacy level of features before doing analysis might decrease the cost. More details and code please check Data pre-processing (privacy-preserving).ipynb
Please go to Privacy-preserving bayesians folder. Privacy-preserving Naive Bayes (Numercial features).ipynb shows how Naive Bayes classification can be done in vertically partitioned data scenairo. We take two situations into account: 1) both parties know the target class; 2) only one party knows the target class. We also applied secure scalar product to calculate mean and variance values.
Bayesian Network is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Here is a very good Structure Model Learning Package: https://github.com/pgmpy/pgmpy In this notebook, so far only K2 algorithm is applied for structure learning. Other algorithms and Parameter Learnings are still working-in-progress.