This project contains the steps undertaken to perform association analyses of spirometry measures in the UK Biobank (UKBB) with variants around SLC26A9 gene for patients with spirometry-defined COPD with modified GOLD criteria 2-4 of moderate to very severe lung function. The definition here is relaxed in that measurements are not required to be post-bronchodilator measurements. According to Mannino and Buist, 2007, in instances where pre-bronchodilator lung function has been recorded, an overestimate of airflow obstruction may result.
-
01-extract_phenos_of_interest.sh
- input:
ukb24727.tab
, which contains all phenotypic information from UKBB
- output:
ukb24727_spirometry.tab
, a smaller file containing the required variables only
- input:
-
02-subset_qc_copd_individuals.R
- input:
ukb24727_spirometry.tab
- output:
ukbb_spiro_and_geno_qc_v2.csv
, which contains all the individuals passing spirometry and genotyping QC, and their spirometry measures (best FEV1, best FVC, and FEV1pp)GOLD2-4_copd_ukbb_spirodata.csv
, which is the subset of individuals fromukbb_spiro_and_geno_qc.csv
that fit GOLD class 2-4 criteria for lung function (i.e. FEV1/FVC ratio < 0.7 and FEV1pp < 80%)
This step removes individuals that did not pass spirometry and genotyping QC, removes related and non-European individuals, and calculates FEV1pp using the GLI calculator (Global Lung Function Initiative 2021, version 2.0). The procedure is similar to Shrine et al. 2019 with the exception of removal of related individuals, where KING's
--unrelated
option (v2.0) was used here to obtain the unrelated set, which results in the removal of 36,004 participants versus 1,165 in Shrine et al.. Another key difference is that we removed non-Europeans using UKBB's VariableID 22006, whereas Shrine et al. opted for a K-means clustering method. Taken together, Shrine et al.'s method yields 321,047 participants, whereas our method yields 263,461. - input:
-
- input: genotype array data for:
- All individuals defined in
GOLD2-4_copd_ukbb_spirodata.csv
- All individuals defined in
ukbb_spiro_and_geno_qc.csv
- All individuals defined in
- output:
15-ukbb_copd_pcair_eigenvectors.txt
for individuals corresponding toGOLD2-4_copd_ukbb_spirodata.csv
18-ukbb_ukbbspiro_flashpca2_eigenvectors.txt
for individuals corresponding toukbb_spiro_and_geno_qc.csv
FlashPCA v2.1 was used to calculate the principal components.
- input: genotype array data for:
-
- inputs:
ukbb_spiro_and_geno_qc.csv
15-ukbb_copd_pcair_eigenvectors.txt
, derived from genotyped array data from UKBB using flashPCA2 (v2.1)18-ukbb_ukbbspiro_flashpca2_eigenvectors.txt
, derived from genotyped array data from UKBB using flashPCA2 (v2.1)ukb_imp_chr1_v3.bgen
, which is the imputation dataset provided by UKBB
- output:
ratio.irnt.assoc_v3.csv
, association results for FEV1/FVC ratio among all 263,461 UKBB participantsfev1pp.irnt.assoc_v3.csv
, association results for FEV1pp among all 263,461 UKBB participantspef.irnt.assoc_v3.csv
, association results for PEF among all 263,461 UKBB participantsratio.irnt.assoc_copd_only.csv
, association results for FEV1/FVC ratio among all UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)fev1pp.irnt.assoc_copd_only.csv
, association results for FEV1pp among all 263,461 UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)pef.irnt.assoc_copd_only.csv
, association results for PEF among all 263,461 UKBB participants with spirometrically-defined COPD as per GOLD2-4 (N=22,071)hasCOPD.assoc.csv
, case-control association analysis of spirometrically-defined COPD cases (GOLD2-4), against those with healthy lung function (22,071 cases versus 242,097 controls)
- inputs:
-
- inputs: the association files:
ratio.irnt.assoc.tsv
fev1pp.irnt.assoc.tsv
ratio.irnt.assoc_copd_only.tsv
fev1pp.irnt.assoc_copd_only.tsv
hasCOPD.assoc.tsv
- output:
- spirometry_in_ukbb.html for use in LocusFocus as secondary datasets to test colocalization
- inputs: the association files:
This step prepares the association results for loading as secondary datasets into LocusFocus
The GWAS of Meconium Ileus (MI) at chr1:205,780,000-205,940,000 was tested for colocalization against the lung function phenotypic associations derived above, to test for the pleiotropic effects of this modifier locus of Cystic Fibrosis (CF) on lung function.
Colocalization was observed when the genome-wide associated peak was tested:
LocusFocus plot testing colocalization of PEF (peak expiratory flow) from Shrine et al. (2019) (shown as points and corresponding left y-axis) against GTEx V7 lung eQTL for SLC26A9 gene, Meconium Ileus GWAS, FEV11pp calculated from UKBB spirometry measures in all participants after QC as explained above, FEV1/FVC ratio also from Shrine et al. (2019) and COPD case/control study calculated in this project after QC (cases=22,071; controls=241,390). All secondary datasets are shown as lines traversing the lowest p-values per window (window=6.67Kbp) with corresponding right y-axis. The Simple Sum colocalization region tested (gray area) was selected to match the observed peak at chr1:205,899,000-205,925,000. A total of 85 SNPs in this region were used to test for colocalization using the Simple Sum method.
Colocalization results obtained are summarized in UKBB_spirometry_SS_pvalues.csv. In short, the MI GWAS colocalizes with FEV1/FVC ratio and PEF (peak expiratory flow) association studies, and this colocalization is found to be statistically significant after multiple testing correction (-log10P > 1.78).
Dataset | Simple Sum colocalization -log10(P-value) |
---|---|
Meconium Ileus GWAS (N=6,770) | 8.1 |
FEV1/FVC in Shrine et al. (N~396,686) | 8.65 |
FEV1pp in UKBB (N=263,461) | Did not pass first stage test |
COPD case-control (N=263,461) | Did not pass first stage test |