DataMed Admixture
Download and install Docker community edition
Input files are genotypes in plink text format (cohortname.ped and cohortname.map) and mapped to GRCh37 human reference genome (no "chr" in chromsome ID). Please refer to plink or plink 1.9 documentation to obtain such files from other formats, including VCF formats
the analysis produces two file names after the input file
- output_cohortname.txt provides the admixture level of each subject for each reference population
- output_summary_cohortname.txt provides the cohort-wide cumulative admixture fraction as well as the diversity score.
The admixture and diversity scores can be calculated from 2 reference cohorts: HapMap3 (8 population) and 1000G (5 continental super-population) and we provide 3 possible scripts to do so
- run_hapmap3.sh: using HapMap3 cohort.
- run_1000g_withLDpruning.sh: using pruned version of 1000G cohort: ideal for dense genotypes, such as whole genome, or genotyping arrays
- run_1000g_withoutLDpruning.sh: using unpruned version of 1000G cohort: ideal for small targeted genotypes (RNA-Seq, ChIP-Seq, Exomes)
In order to test the analysis we provide an accessory script 'prepare_example.sh', which downloads a public datasets and reformats it to plink ped and map format
Rankinen et al. PLoS One 2016
- No Evidence of a Common DNA Variant Profile Specific to World Class Endurance Athletes
- Article on PubMed
- Data on Figshare
Commands to run an example run with a published data. Set the absolute path of your own local directory, where output will be created.
run the following set of commands to download the example data and convert them in your 'mydata' directory
export MY_LOCAL_DIR="/usr/john/mydata"
docker run -d -v ${MY_LOCAL_DIR}:/results j5kim/datamed-admixture:latest bash /opt/DataMed-Admixture/example/prepare_example.sh
call hapmap3 based admixture on the test data by executing the following commands.
docker run -d -v ${MY_LOCAL_DIR}:/results j5kim/datamed-admixture:latest bash /opt/DataMed-Admixture/scripts/run_hapmap3.sh /results/rankinen
Above run is success if you see the 2 files output.rankinen.txt and output_summary_rankinen.txt
Provide your data in plink (cohortname.ped and cohortname.map) format and run the following command to call admixture and diversity score from the Hapmap3 reference.
export MY_LOCAL_DIR="/usr/john/mydata"
docker run -d -v ${MY_LOCAL_DIR}:/results j5kim/datamed-admixture:latest bash /opt/DataMed-Admixture/scripts/run_hapmap3.sh /results/cohortname
or run the following to call from the 1000G cohort (without pruning).
export MY_LOCAL_DIR="/usr/john/mydata"
docker run -d -v ${MY_LOCAL_DIR}:/results j5kim/datamed-admixture:latest bash /opt/DataMed-Admixture/scripts/run_1000g_withoutLDpruning.sh /results/cohortname