Script to compute independent SNPs and loci based on GWAS data.
- Mac OS X, or UNIX operating system (Microsoft Windows is not supported)
- Python version 2.7 (or higher)
- PIP (used for install Python libraries)
sudo easy_install pip
- Python-bx (on Mac OS X you may be prompted to install XCode)
sudo pip install bx-python
- Pandas (version 0.15.2 or higher)
sudo pip install pandas
- PLINK version 1 or preferably PLINK version 2
- Make sure you have PLINK binary genotype formated 1000 Genomes Project genotype data (European individuals that is GBR, FIN, IBS, TSI and CEU, see SNPsnap documentation). You can download preformated data here.
- Please download the SNPsnap collection file, which provides you with precomputed LD r2 boundaries for each 1000 Genome Project phase 3 SNPs. Files can be downloaded here.
- Set
label
to the name of your GWAS summary statistics file (leave out the file extension) - Set
plink_genotype_data_plink_prefix
to the path and prefix of you 1000 Genomes Project genotype data. - Set
collection_file
to the filename of the SNPsnap collection file. - Run the script. The default settings will compute * Indpendent SNPs, that is SNPs with low linkage disequilibrium (r2 < 0.1) to a more significantly associated SNP within a 500-kb window. * Indpendent loci, that is loci containing all SNPs correlated at e.g. r2 > 0.5 with any other associated SNP. Associated loci closer to 250 kb to each other are merged.