This github repository includes the implementation of Logistic Regression Approximation (LRA) and Chi-Square GWAS protocols described in Secure large-scale genome-wide association studies using homomorphic encryption by Marcelo Blatt, Alexander Gusev, Yuriy Polyakov, and Shafi Goldwasser.
This code was originally written using PALISADE (https://gitlab.com/duality-technologies-public/palisade-gwas-demos/) and then migrated over to OpenFHE.
The repo includes the following files:
- demo-logistic.cpp - research prototype for the LRA protocol.
- demo-chi2.cpp - research prototype for the Chi-Square protocol.
- data/random_sample.csv - an artificial random data set including 3 features, 200 individuals, and 16,384 SNPs (provided solely for demonstration purposes).
-
Install OpenFHE from OpenFHE Development Repository. Follow the instructions provided in https://github.com/openfheorg/openfhe-development/blob/main/README.md.
-
Clone this repository to a local directory and switch to this directory.
-
Create a directory where the binaries will be built. The typical choice is a subfolder "build". In this case, run the following commands:
mkdir build
cd build
cmake ..
make
- Run the following command to execute the LRA prototype (change sample size and number of SNPs as needed):
./demo-logistic --SNPdir "../data" --SNPfilename "random_sample" --pvalue "pvalue.txt" --runtime "result.txt" --samplesize="200" --snps="16384"
or
Run the following command to execute the Chi-Square prototype (change sample size and number of SNPs as needed):
./demo-chi2 --SNPdir "../data" --SNPfilename "random_sample" --pvalue "pvalue.txt" --runtime "result.txt" --samplesize="200" --snps="16384"
- The results will be written to the "data" folder. The following output files will be created for both prototypes:
- pvalue.txt - p-values for each SNP
- result.txt - runtime metrics
Additional files with outputs of protocol-specific statistics will also be created.