This GitHub repository includes steps to run snakemake file that help in performing population genomics analysis using whole genome resequencing data. Here, you'll find Snakemake file designed to parallelize the steps involved in these analyses, making the workflow more efficient and speeding up the process.
Current Workflow: The current workflow covers the entire process from short read data processing to the establishment of population structure i.e. PCA (Principal Component Analysis) plotting. This setup uses Snakemake workflow language to streamline and automate these tasks.
Future Updates: Stay tuned for future updates, which will include additional analyses such as Fst analysis, admixture analysis, and more. Feel free to explore the repositories, and don't hesitate to reach out if you have any questions or suggestions!
This snakmake workflow was created using commands from Elahe Parvizi's GitHub repositories.
Create a project folder and give it a meaningful Project_ID.
mkdir <project-id>
Copy the following files into the project folder:
Snakefile
Stats.R
config.yaml
Inside the project folder, create a sub-folder named 01_Data
.
mkdir <project-id>/01_Data
Copy the sample files into the 01_Data
folder.
cp *.fq <project-id>/01_Data
Ensure that the fastq files are named according to this pattern. Ex: Featherston_01.fastq, Featherston_02.fastq, Mosburn_01.fastq
To help you understand how to label the files correctly:
Each file should have a name followed by an underscore and a two-digit number.
The name represents a specific population or sample, such as "Featherston" or "Mosburn".
The two-digit number distinguishes different files from the same population or sample.
Utilize the config.yaml
file to add any additional information required for the workflow.
##### Sequencing platform info (mostly keep this constant)
PL: "Illumina"
PM: "HISEQ"
##### Assign threads
THREADS: 16
##### Provide path to reference file (Ensure reference is indexed using BWA index command and available in path provided)
fasta_path: /path/to/.fasta
######## Variant calling filter parameters ########
min_MQ: 20
min_BQ: 20
vcf_name: 'M_aethio_MOSS_LIN_FEA' #Used to assign names to output files generated in most steps
MAF: 'MAF > 0.05'
######## PLINK parameters #################
GENO: 0.1
Navigate to the project folder in your terminal.
Perform dry-run to test the script using "-n" flag:
snakemake --configfile=config.yaml --cores 8 -n
Proper execution use the following:
snakemake --configfile=config.yaml --cores 8