SeniorProject

Identification and Characterization of Butyrate-Producing Species in the Human Gut Microbiome

Introduction

In this project we explore genomes from HMP 1 to attempt to identify species that may be capable of butyrate production based on the presence of enzyme-encoding genes involved in the pathway(s) of butyrate production. In this repository, you will find the code and methods used for data formatting, protein alignments, and data analysis.

DATA:

Species = http://downloads.ihmpdcc.org/data/reference_genomes/body_sites/Gastrointestinal_tract.cds.fsa ("Gastrointestinal_tract.cds.fsa") Pathway Proteins = https://www.ncbi.nlm.nih.gov/nuccore/DQ987697.1?report=fasta** (Save as "butyrate_genes_BTC&BK.fasta"), https://www.ncbi.nlm.nih.gov/nuccore/AY796317.2?report=fasta** (Save as "butyryl_COA_transferase_BCT.fasta"), https://www.ncbi.nlm.nih.gov/protein/WP_003420701.1?report=fasta (Save as "phosphate_butyryltransferase_BK.fasta", https://www.ncbi.nlm.nih.gov/protein/WP_003722496.1?report=fasta (Save as "butyrate_kinase_BK.fasta")

** partial CDS that will be translated to protein sequence during formatting.

HOW TO USE:

Step 1 - Data formatting

Collect data from HMP
Collect data from NCBI
run DataFormatting.py --> Creates fasta files for all protein sequences and all species

Step 2 - Alignments

run RunAlignments.py --> Creates xml files for each alignment, csv files for unprocessed results, and csv files for scored results

Step 3 - Analyses

pull 16s rRNA data and taxon info from Silva Database (batch download)
run Analyze_Data.py --> Creates csv output files for labeling taxon info, for formating input ANOVA files, and a fasta file for running the multiple sequence alignment on Clustal Omega
Run ANOVA.R with the output files from step 2
Run Clustal Omega multiple sequence alignment with output files from step 2
Interpret results with output taxon files from step 2, from the phylogenetic analysis, and from the ANOVA.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
Species		Species
ANOVA.R		ANOVA.R
ANOVA_AnalysesBCTnBK.csv		ANOVA_AnalysesBCTnBK.csv
Align.py		Align.py
Analyze_Data.py		Analyze_Data.py
BCT_ANOVA.csv		BCT_ANOVA.csv
BCT_LSD.txt		BCT_LSD.txt
BK_ANOVA.csv		BK_ANOVA.csv
BK_LSD.txt		BK_LSD.txt
DataFormatting.py		DataFormatting.py
README.md		README.md
RunAlignments.py		RunAlignments.py
Silva_16s_Sequences_Class.fasta		Silva_16s_Sequences_Class.fasta
butyrate_genes_BTC&BK.fasta		butyrate_genes_BTC&BK.fasta
butyrate_genes_sep1.fasta		butyrate_genes_sep1.fasta
butyrate_kinase_BK.fasta		butyrate_kinase_BK.fasta
butyryl_COA_transferase_BCT.fasta		butyryl_COA_transferase_BCT.fasta
formatPathwayGenes.py		formatPathwayGenes.py
formatSpecies.py		formatSpecies.py
helper.py		helper.py
main.py		main.py
phosphate_butyryltransferase_BK.fasta		phosphate_butyryltransferase_BK.fasta
scored_results_60.csv		scored_results_60.csv
taxon_info.csv		taxon_info.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeniorProject

Identification and Characterization of Butyrate-Producing Species in the Human Gut Microbiome

Introduction

DATA:

HOW TO USE:

Step 1 - Data formatting

Step 2 - Alignments

Step 3 - Analyses

About

Releases

Packages

Languages

gmaline/SeniorProject

Folders and files

Latest commit

History

Repository files navigation

SeniorProject

Identification and Characterization of Butyrate-Producing Species in the Human Gut Microbiome

Introduction

DATA:

HOW TO USE:

Step 1 - Data formatting

Step 2 - Alignments

Step 3 - Analyses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages