This repository contains scripts to perform Brain-Wide Association Study (BWAS) to find the association between brain imaging-derived phenotypes and idiopathic pulmonary fibrosis genetic risk, as described in this paper: A. R. Mohammadi-Nejad, R. J. Allen, L. Kraven, O. C. Leavy, R. G. Jenkins, D. Auer, L. V. Wain, S. N. Sotiropoulos, “Mapping brain endophenotypes associated with idiopathic pulmonary fibrosis genetic predisposition”, eBioMedicine, 2022.
Figures in our paper can be regenerated using the codesand the data that have been provided for each figure (Figures 2, 3, 4, and 5) in the following sections.
Figure 2. Manhattan plot of the BWAS outcomes of 17 known IPF variants using UK Biobank imaging cohort (S=32,431)
A jupyter notebook has been provided to generate Figure 2 (Figure_2.ipynb
). Inside the data folder, there are 17 files and each of them is assigned to each IPF genetic variant (IPF_XX.txt
). The list of all the IPF genetic variants is available here.
These files are the output of the BWAS generated by PLINK v2.0. The dimensions of each file is the Number of IDPs (1,248) x 10. The columns are 'BETA', 'T_STAT', 'UNADJ', 'GC', 'BONF', 'HOLM', 'SIDAK_SS', 'SIDAK_SD', 'FDR_BH', and 'FDR_BY' respectively. You can find more information about each column on PLINK's website. The list of the IDPs is available here. By running this code, the results will be a figure like this:
To generate Figure 3a, the user needs to run the jupyter notebook provided for this figure (Figure_3.ipynb
). The user should specify the IPF variant number by changing the VAR
variable in line 12. Same as Figure 2, the data folder contains 17 files (IPF_XX.txt
) and each of them contains the summary statistics of the BWAS for each single IPF variant. These files are the same as the files that have been described in Figure 2. In addition, for each variant, there is a file known as coloc_XX.txt
. The coloc_XX.txt
file is a vector and its dimension is the number of IDPs (1,248) x 1. The value in each row can take '1' or '4'. Here, '4' means that IDP has been co-localised against IPF risk which has been described in Figure 4. In addition, the data folder of this figure contains another file known as ind_type.txt
. This file shows each IDP is categorised in which group IDP. By running this code, the results will be a figure like this:
We can use the same code to generate the Manhattan plot of chromosome 17 IPF variant presented in Figure 3a of the paper.
Figure 4. Co-localisation of the two most significantly associated IDPs with chromosome 8 (region around the DEPTOR gene), against IPF risk.
The Miami plots of this paper have been generated by coloc_plot.R
code. The user needs to change two lines of this code to generate the coloc plot for different chromosomes IPF variants and also different IDPs. In line 6 of this code, the user can determine the chromosome number by changing the CHR
variable. The chromosome number of each IPF variant is in accordance of this file. In addition, the user can change the FID
variable in line 7 to change the IDP number. The list of the most significant IDPs in both chromosome 8 and chromosome 17 IPF variants has been provided at the end of this code. Here, in the data folder, we have two folders for Chr 8 and Chr 17 IPF variants. The coloc_plot.R
code will read the myData_XXX.csv
file based on the pre-specified chromosome number and the IDP number. This file has 6 columns which are chr, pos, rsid, trait1_p, trait2_p, and r2. The number of rows can vary chromosome by chromosome based on the discussion we had in the paper. Here, trait 1 and trait 2 are the variables that have been discussed in the paper.
The second code that we should run to generate the COLOC posterior probability is Colocalisation.R
. Same as the coloc_plot.R
code, you should specify the chromosome number of the IPF variant and also the IDP number by changing the CHR
and FID
numbers in lines 6 and 7. This code will read the GWAS summary statistics of trait 1 (GWAS.csv
) and trait 2 (BWAS_XXX.csv
) and will calculate the PP.H0, PP.H1, PP.H2, PP.H3, and PP.H4. By running these two codes (by setting CHR = 8
and FID = 488
(Figure 4a) and FID = 1218
(Figure 4b)), the results will be the following figures:
and
A Matlab code (Figure_5.m
) has been provided to discuss the procedure we used to generate the mediation analysis results. To do the mediation analysis, we used M3 Toolbox.
By running this code, we estimated the weights of each path (paths a, b, axb, c, and c') and put them in a path diagram like this: