-
Notifications
You must be signed in to change notification settings - Fork 343
Summary Statistics File Format
Brendan Bulik-Sullivan edited this page Mar 5, 2015
·
3 revisions
This page describes all new file formats introduced for use with the --h2 and --rg flags.
NOTE chromosomes are assumed to be integers. We haven't yet implemented LD Score regression for sex chromosomes
For GWAS data. Whitespace-delimited text, one row per SNP with a header row. Column order does not matter.
We recommend that you convert your summary statistics to the .sumstats
format using the munge_sumstats.py
program included with ldsc
, because munge_sumstats.py
checks all the gotchas that we've run into over the course of developing this software and applying it to a lot of data.
Required Columns
-
SNP
-- SNP identifier (e.g., rs number) -
N
-- sample size (which may vary from SNP to SNP). -
Z
-- z-score. Sign with respect toA1
(warning, possible gotcha) -
A1
-- first allele (effect allele) -
A2
-- second allele (other allele)
Note that ldsc
filters out all variants that are not SNPs and strand-ambiguous SNPs.