A curated cyanobacterial 16S rRNA gene reference package for sequence placement and de novo phylogenetic analysis

Daniel Roush, Ana Giraldo-Silva, and Ferran Garcia-Pichel

¹School of Life Sciences, Arizona State University, 85282 Tempe, Arizona, USA

²Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, 85282 Tempe, Arizona, USA

About Cydrasil

Cydrasil is a comprehensive, curated, and community-built cyanobacterial reference package containing over 1300 16S rRNA gene sequences with lengths exceeding 1100 base pairs. Cydrasil offers a curated alignment and a maximum-likelihood phylogenetic tree that can be used for sequence placement or de novo phylogenetic reconstructions. Cydrasil utilizes phylogenetic placement to give you a complete phylogenetic perspective of your new isolate or amplicon survey.

Current Stats:

Total Sequences:	1327
Cyanobacteria:	1288
Sibling Clades:	34
Plastids:	5

Package Contents

File/Folder Name	Description
cydrasil-v3-reference-sequence-list.fasta	FASTA file containing the entire Cydrasil reference sequence set used as input for phylogenetic analysis
cydrasil-v3-database.json	JSON formatted database containing metadata about Cydrasil reference sequences. The file includes a link to the source repository and any warnings about the sequence.
cydrasil-v3-database.tsv	TSV formatted database containing metadata about Cydrasil reference sequences. The file includes a link to the source repository and any warnings about the sequence.
cydrasil-v3-reference-alignment.afa	Aligned FASTA formatted version of the Cydrasil reference alignment.
cydrasil-v3-reference-alignment.phy	Aligned relaxed PHYLIP formatted version of the Cydrasil reference alignment.
cydrasil-v3-ssu-align.bacteria.mask	SSU-Align mask file for use with the SSU-Align version of the pipeline
cydrasil-v3-reference-tree.nwk	Newick formatted Cydrasil reference phylogenetic tree.
cydrasil-v3.bestModel	File containing model parameters for the generation of the reference tree. Needed for the model parameter specification (--model) for EPA-ng.

Methodology

Cydrasil was created originally as a comprehensive cyanobacterial 16S rRNA gene reference package to be used with phylogenetic placement algorithms like pplacer, RAxML epa, and EPA-ng. When we originally populated the database, we had strict inclusion criteria for reference sequences.

The sequence must come from an isolated strain or a single-cell genome. Exceptions were made for metagenome-assembled genomes on a case by case basis after manual review of the genome (needed for most sibling clade sequences).
Each reference sequence must be 1100 base pairs or longer. The minimum length was chosen as a compromise between strain coverage and sequence information for phylogenetic reconstruction. Of note, this excludes all cyanobacteria sequenced using the Nübel et al. 1997 cyanobacteria-specific primers.

Cydrasil is populated with sequences collected from NCBI, JGI-IMG/M, and user submissions. The reference alignment was generated using SSU-Align¹ with default parameters. This aligner uses a profile-based alignment strategy, in which each target sequence is aligned independently to a covariance model that uses the 16S rRNA gene secondary structure, and then masked using SSUMASK with the automatically computed alignment confidence values (posterior probabilities). A maximum-likelihood phylogenetic tree was then generated using the RAxML-HPC2² Workflow on XSEDE (8.2.12) on the CIPRES Science Gateway³. The ML + thorough bootstrap workflow was used with the following modified parameters: 1000 bootstraps (-N 1000) and the GTRGAMMA model. All other parameters were left at default values.

How to Conduct a Local Sequence Placement Run

Alignment of query sequences to Cydrasil reference alignment

A FASTA file containing sequences of interest (typically Cyanobacterial sequences extracted from the reference sequences file output from Qiime 1/2) is aligned to the reference alignment.

Here you can use either PaPaRa⁴ or SSU-Align¹ to align your query sequences to the Cydrasil reference alignment. The reference package contains a phylip (.phy) alignment (Cydrasil reference alignment) to use with PaPaRa.

Using PaPaRa

PaPaRa Information
PaPaRa Githib

PaPaRa Arguments

Flag Description Cydrasil Filename

-t Reference tree in newick (.nwk) format cydrasil-v3-reference-tree.nwk

-s Reference alignment in phylip (.phy) format cydrasil-v3-reference-alignment.phy

-q FASTA file containing sequences to be aligned (query sequences) to the reference alignment

-n Name of output alignment

-r Prevent PaPaRa from adding gaps in the reference alignment

Example Command
papara -t cydrasil-v3-reference-tree.nwk -s cydrasil-v3-reference-alignment.phy -q query-seqs.fasta -r -n combined-aln

Using SSU-Align

SSU-Align Information

SSU-Align Use

SSU-Align uses a workflow instead of just one command, but has the advantage of using the same alignment model as the Cydrasil reference alignment.
Importantly, SSU-Align uses input order instead of input flags (-t) for parameters.
The command ssu-align will align your query sequences to the SSU-Align 16S model.
ssu-align query-seqs.fasta output-directory-name
Example ssu-align Workflow
ssu-align query-seqs.fasta aln
Mask the alignment using the mask file in the Cydrasil reference package
ssu-mask -s alignment.bacteria.mask aln/
Reformat the SSU-Align alignment from stockholm (.stk) to aligned fasta (.afa)
ssu-esl-reformat -o aligned-query-sequences-masked.afa afa aln/aln.bacteria.mask.stk
Alternatively, the command ssu-prep -x allows for a multi-threaded alignment run on a desktop computer, but requires a few extra steps.
This is recommended for large query sequence files.
ssu-prep -x query-seqs.fasta output-directory-name number-of-threads
Example ssu-prep Workflow
ssu-prep -x query-seqs.fasta aln 1
Output is a bash script: aln.ssu-align.sh
Running the script will execute the SSU-Align alignment
bash aln.ssu-align.sh
Cleanup the bash script
rm aln.ssu-align.sh
Mask the alignment using the mask file in the Cydrasil reference package
ssu-mask -s alignment.bacteria.mask aln/
Reformat the SSU-Align alignment from stockholm (.stk) to aligned fasta (.afa)
ssu-esl-reformat -o aligned-query-sequences-masked.afa afa aln/aln.bacteria.mask.stk

Conduct a sequence placement run using EPA-ng

Next, we use the aligned query sequences as the input into a sequence placement algorithm. We recommend EPA-ng.

EPA-ng Arguments

Flag	Description	Filename
-s	Cydrasil reference alignment file in afa format	cydrasil-v3-reference-alignment.afa
-t	Reference tree file	cydrasil-v3-reference-tree.nwk
-q	Query sequence alignment	SSU-Align: aligned-query-sequences-masked.afa \| PaPaRa: query.fasta
--model	Cydrasil model parameter specification file	cydrasil-v3.bestModel
-w	Output directory
-T	Number of threads for multithreading

IMPORTANT NOTE If you used PaPaRa to align your query sequences, you need to run an extra command before you do placements to remove the reference sequences from the query alignment. This is a product of how PaPaRa was coded. If you skip this step, your output file will contain both query and reference sequences as placements.
epa-ng --split cydrasil-v3-reference-alignment.phy papara_alignment.combined-aln
The output file from this command is query.fasta

Example EPA-ng Command

epa-ng -s cydrasil-v3-reference-alignment.afa -t cydrasil-v3-reference-tree.nwk -q aligned-query-sequences-masked.afa --model

Output

EPA-ng outputs a .JPLACE file named placements.jplace.

Visualize and analyze your results

Tree Visualization

To visualize the .JPLACE containing the sequences of interest, the placement file ```placements.jplace is uploaded onto [iTOL] (https://itol.embl.de/)⁵. Placements act as a dataset within iTOL and can be toggled. Nodes with sequences of interest are visualized with red circles and clicking a node will show a breakdown of sequence ids and the corresponding confidence values for that node.

Steps to visualize tree in iTOL:
1. Create an user account with iTOL
2. Upload (drag and drop) .jplace file into an iTOL project
3. Reroot the tree by searching for WOR1 and then clicking on its parent node (I2504) and navigating to Tree structure > Re-root the tree here
4. From the Controls box set parameters as follows:
Basic
- Display mode: Normal
- Parameters: 0 degree rotation
- Invert: No
- Slanted: No
- Branch lengths: Use
- Labels: At tips NOTE: You can toggle labels to off if the tree slows your computer.
- Label rotation: On
- Label alignment: Left
ADVANCED: No changes. If you are having trouble reading your results, you can use the scaling factors to separate leaves.
DATASETS
- Turn on phylogenetic placements
- Use “Show query form” to search placement of individiual query sequences
- Insert query sequence ID and and click on the sequence ID name to display red circles indicating phylogenic placement of a given query sequence.

References

Nawrocki E. 2009. Structural RNA Homology Search and Alignment Using Covariance Models. Washington University School of Medicine.
Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
Miller MA, Pfeiffer W, Schwartz T. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gatew Comput Environ Work GCE 2010.
Berger SA, Stamatakis A. 2011. Aligning short reads to reference alignments and trees. Bioinformatics 27:2068–2075.
Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
Cydrasil-v3-reference-alignment.afa		Cydrasil-v3-reference-alignment.afa
Cydrasil-v3-reference-alignment.phy		Cydrasil-v3-reference-alignment.phy
LICENSE		LICENSE
README.md		README.md
cydrasil-v3-database.json		cydrasil-v3-database.json
cydrasil-v3-database.tsv		cydrasil-v3-database.tsv
cydrasil-v3-full-phylogeny.png		cydrasil-v3-full-phylogeny.png
cydrasil-v3-reference-tree.nwk		cydrasil-v3-reference-tree.nwk
cydrasil-v3-sequence-list.fasta		cydrasil-v3-sequence-list.fasta
cydrasil-v3-ssu-align.bacteria.mask		cydrasil-v3-ssu-align.bacteria.mask
cydrasil-v3.bestModel		cydrasil-v3.bestModel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A curated cyanobacterial 16S rRNA gene reference package for sequence placement and de novo phylogenetic analysis

About Cydrasil

Current Stats:

Package Contents

Methodology

How to Conduct a Local Sequence Placement Run

Alignment of query sequences to Cydrasil reference alignment

Using PaPaRa

PaPaRa Arguments

Example Command

Using SSU-Align

SSU-Align Use

Example `ssu-align` Workflow

Example `ssu-prep` Workflow

Conduct a sequence placement run using EPA-ng

EPA-ng Arguments

Visualize and analyze your results

References

About

Releases 4

Packages

Contributors 2

Flag	Description	Cydrasil Filename
-t	Reference tree in newick (.nwk) format	cydrasil-v3-reference-tree.nwk
-s	Reference alignment in phylip (.phy) format	cydrasil-v3-reference-alignment.phy
-q	FASTA file containing sequences to be aligned (query sequences) to the reference alignment
-n	Name of output alignment
-r	Prevent PaPaRa from adding gaps in the reference alignment

License

FGPLab/cydrasil

Folders and files

Latest commit

History

Repository files navigation

A curated cyanobacterial 16S rRNA gene reference package for sequence placement and de novo phylogenetic analysis

About Cydrasil

Current Stats:

Package Contents

Methodology

How to Conduct a Local Sequence Placement Run

Alignment of query sequences to Cydrasil reference alignment

Using PaPaRa

PaPaRa Arguments

Example Command

Using SSU-Align

SSU-Align Use

Example ssu-align Workflow

Example ssu-prep Workflow

Conduct a sequence placement run using EPA-ng

EPA-ng Arguments

Visualize and analyze your results

References

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Example `ssu-align` Workflow

Example `ssu-prep` Workflow

Packages