Imputation Workflow

This repo contains the workflow which is being developed as part of the H3 Africa BioNet Hackathon in Pretoria, SA.

Open tasks

We're tracking our open tasks using github's issues: https://github.com/h3abionet/chipimputation/issues

The 1000ft view is located on our trello board.

Using this imputation workflow

To utilize this workflow, you will have to follow the following process:

Deploy a small node to configure the shared filesystem and download reference material
Configure a persistent, shared filesystem which will store the results of the imputation as well as the reference material.
Download the imputation reference panels (see instructions below).
Destroy the configuration node (optional)
Deploy the VMs to run the imputation calculations
Log into the manager node of the deployed VMs and run the imputation workflow
Wait
Examine results of the workflow

Deploying a configuration node

Deploy a small node (configuration node) which will be used to configure the shared filesystem and download reference material before deploying the nodes.

This can be done on openstack using:

nova boot --flavor m1.small --key-name 'yourkey' configurationnode

after you have loaded the OpenStack RC file into your shell's environment and submitted an appropriate ssh key.

Persistent, shared filesystem

The persistent, shared filesystem will store the results of the imputation as well as the reference material. This should be a block store which can be mounted on one of the imputation nodes.

If you are using openstack, this persistent filesystem can be created using nova volume-create 100G. You can then associate the blockstore with the previously spawned node, ssh into the spawned node (nova ssh configurationnode) and run sudo mkfs.ext4 /dev/vdb (or similar) to format the block store and run mkdir -p /srv/imputation; mount /dev/vdb /srv/imputation to mount the newly formatted blockstore.

NB: running mkfs.ext4 will destroy all of the information on this block store; be careful before running this command.

Make note of the id of the shared blockstore so that it can be mounted on the head node of the cluster that you will deploy later.

Imputation Reference Material

The built in IMPUTEv2 reference panels can be obtained ssh'ing into the configuration node just created by running

mkdir /srv/imputation/refdata;
(cd /srv/imputation/refdata;
 wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3{,_chrX}.tgz
 for a in 1000GP_Phase3{,_chrX}.tgz; do tar -zxf $a; done;
 rm -f 1000GP_Phase3{,_chrX}.tgz;
 mv *chrX* 100GP_Phase3/;
);

If you are using another directory location, you will need to change the paths located above and the nextflow configuration file appropriately.

Destroy the configuration node

You can remove the configuration node at this point with nova delete configurationnode.

Deploy VMs for imputation

At this point, you should deploy the VMs that will be used for imputation.

If you are using OpenStack, you can simply cd openstack; ./generate_openstack to generate a fleet of 5 computational nodes and 3 smaller management nodes.

If not, you can use the cloud-init configuration scripts to generate the necessary configuration on AWS or another cloud provider.

Run Imputation

You can log into the manager0 node (`nova ssh manager0) and run the imputation (or any other nextflow based workflow) with

 NXF_EXECUTOR_CPUS=100 /srv/imputation/nextflow/nextflow run -qs \
 1000 -ps 1000 run.nf -with-docker \
 'quay.io/h3abionet_org/impute2:latest' -c ../docker_nextflow \
 -with-timeline foo.html -with-dag foo.png \

Run imputation locally

You can alternatively run the imputation locally with something like this:

 nextflow ./impute2.nf --chr chr7 --begin 0 --end 2e6 --refdir /srv/imputation/refdata

Sample data

Sample data for imputation can be generated using the generate_testdata.pl in the data subdirectory by subsetting the imputation reference panels and then imputing against them.

Examine results

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
docker_swarm/ca_keys		docker_swarm/ca_keys
impute2_docker		impute2_docker
openstack		openstack
testruns		testruns
README.md		README.md
impute2.nf		impute2.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imputation Workflow

Open tasks

Using this imputation workflow

Deploying a configuration node

Persistent, shared filesystem

Imputation Reference Material

Destroy the configuration node

Deploy VMs for imputation

Run Imputation

Run imputation locally

Sample data

Examine results

About

Releases

Packages

Languages

genomics-usf-wildman/dnhs-chipimputation

Folders and files

Latest commit

History

Repository files navigation

Imputation Workflow

Open tasks

Using this imputation workflow

Deploying a configuration node

Persistent, shared filesystem

Imputation Reference Material

Destroy the configuration node

Deploy VMs for imputation

Run Imputation

Run imputation locally

Sample data

Examine results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages