The goal of this project was to look at the effect an innate type 2 immune response would have on the microbial composition of tuft cells. Tuft cells are a specialized type of chemosensory cell that has various functions depending on its location throughout the body. It’s mainly found in the respiratory, gastrointestinal, and reproductive tracts. Previous research has shown tuft cells are involved in immune responses and specifically play a large role in innate immunity. However, it is not known how tuft cells are exactly integrated within the immune system, or how environmental stimuli affect their response or mechanism. By comparing the microbial composition of tuft cells that have been exposed to a foreign pathogen (in the case of this project, black mold), and untreated cells, we can gain a better understanding of the mechanisms behind their role in the immune system.
Install windows subsystem for linux Download and install miniconda
'conda update conda' 'conda install wget' 'conda activate qiime2-2023.2'
Data was downloaded from European Nucleotide Archive (ENA).
For the purposes of using Qiime2, ENA is helpful because the zipped fastq files are already demultiplexed into forward and reverse reads. If data is not already demultiplexed, there are a few additional steps required before importing data into Qiime2. Visit the Qiime2 website for more information.
Data was already demultiplexed with paired ends, so the following import command was used:
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2
'''
The input parameter: input-path should be a manifest file that lists the sample id for each pair of zipped fastq files.
It should also list the absolute path for the forward read and the absolute path of the reverse read.
Additional metadata columns can be added. The format should look like this:
'''
Sample-ID,Forward-Read,Reverse-Read
Sample1,/path/to/forward_read_1.fastq.gz,/path/to/reverse_read_1.fastq.gz
Sample2,/path/to/forward_read_2.fastq.gz,/path/to/reverse_read_2.fastq.gz
'''
If the data being imported is single end, you can replace the forward and reverse filepath columns with just one column pointing to the location of the data files.
After successfully importing data into Qiime2, it can be useful to generate a visualization of the demultiplexed files.
This allows you to determine how many sequences were obtained per sample and also get a summary of the distribution of sequence qualities at each position in your sequence data.
'''shell
qiime demux summarize \
--i-data demux.qza \
--o-visualization demux.qzv
'''
## DADA2
DADA2 is a pipeline for detecting and correcting (where possible) Illumina amplicon sequence data.
As implemented in the q2-dada2 plugin, this quality control process will additionally filter any phiX reads
(commonly present in marker gene Illumina sequence data) that are identified in the sequencing data and will filter chimeric sequences.
The dada2 denoise-single method requires two parameters that are used in quality
filtering: --p-trim-left m, which trims off the first m bases of each sequence,
and --p-trunc-len n which truncates each sequence at position n.
This allows the user to remove low quality regions of the sequences.
To determine what values to pass for these two parameters, you should review the
Interactive Quality Plot tab in the demux.qzv file that was generated by qiime demux summarize above.
'''shell
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences rep-seqs-dada2.qza \
--o-table table-dada2.qza \
--o-denoising-stats stats-dada2.qza
'''
## Feature Table Summary
After applying quality control and filtering with DADA2 or Deblur, the next step is to generate feature tables.
These offer visualizations with histograms and tables on the features associated with each sample and in the entire dataset.
''' shell
qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file sample-metadata.tsv
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
'''
## Generate phylogenetic tree
QIIME supports several phylogenetic diversity metrics, including Faith’s Phylogenetic Diversity and weighted and unweighted UniFrac.
In addition to counts of features per sample (i.e., the data in the FeatureTable[Frequency] QIIME 2 artifact),
these metrics require a rooted phylogenetic tree relating the features to one another. This information will be stored in a Phylogeny[Rooted] QIIME 2 artifact.
To generate a phylogenetic tree we will use align-to-tree-mafft-fasttree pipeline from the q2-phylogeny plugin.
First, the pipeline uses the mafft program to perform a multiple sequence alignment of the sequences in our
FeatureData[Sequence] to create a FeatureData[AlignedSequence] QIIME 2 artifact. Next, the pipeline masks (or filters)
the alignment to remove positions that are highly variable. These positions are generally considered to add noise to a resulting phylogenetic tree.
Following that, the pipeline applies FastTree to generate a phylogenetic tree from the masked alignment.
The FastTree program creates an unrooted tree, so in the final step in this section midpoint rooting is applied to place the root of the tree at
the midpoint of the longest tip-to-tip distance in the unrooted tree.
''' shell
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza
'''
## Alpha and Beta diversity plots
Alpha and Beta diversity plots were generated
''' shell
qiime diversity core-metrics-phylogenetic \
--i-phylogeny rooted-tree.qza \
--i-table table.qza \
--p-sampling-depth 1103 \
--m-metadata-file sample-metadata.tsv \
--output-dir core-metrics-results
'''
## Taxonomic analysis
To explore the taxonomic composition of the data, a feature classifier is pre-trained.
This allows for an accurate visualization of the taxonomy to be produced.
'''shell
qiime feature-classifier classify-sklearn \
--i-classifier gg-13-8-99-515-806-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
'''
'''
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
'''
'''
qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file sample-metadata.tsv \
--o-visualization taxa-bar-plots.qzv
'''
## Differential Abundance using ANCOM
Differential abundance is a metric used to analyze the difference in microbial
composition between sets of data. The following qiime commands will group sets of data
based on the metadata file provided.
'''
qiime feature-table filter-samples \
--i-table table.qza \
--m-metadata-file sample-metadata.tsv \
--p-where "[body-site]='gut'" \
--o-filtered-table gut-table.qza
'''
'''
qiime composition add-pseudocount \
--i-table gut-table.qza \
--o-composition-table comp-gut-table.qza
'''
'''
qiime composition ancom \
--i-table comp-gut-table.qza \
--m-metadata-file sample-metadata.tsv \
--m-metadata-column subject \
--o-visualization ancom-subject.qzv
'''
## Gneiss
'''
qiime gneiss correlation-clustering \
--i-table table.qza \
--o-clustering hierarchy.qza
'''
'''
qiime gneiss dendrogram-heatmap \
--i-table table.qza \
--i-tree hierarchy.qza \
--m-metadata-file sample-metadata.tsv \
--m-metadata-column Subject \
--p-color-map seismic \
--o-visualization heatmap.qzv
'''