-
Notifications
You must be signed in to change notification settings - Fork 20
Tutorial Curation Running breseq
Back to the Main Curation Tutorial Page
Follow the breseq installation instructions. Most likely, you'll want to install breseq in its own Conda environment.
Make a new directory for this analysis and change into it:
mkdir breseq_run
cd breseq_run
Download the Illumina sequencing read files (FASTQ format): (Use the private Box link until data is public)
Download the annotated reference genome (GenBank format) for the LTEE ancestor strain REL606:
REL606.gbk
You can copy this to your current working directory using wget
:
wget https://raw.githubusercontent.com/barricklab/LTEE/7da91974eafac0c5a8f903ae57275795d4395af2/reference/REL606.gbk
Run the breseq command:
breseq -j 8 -l 80 -o output -r REL606.gbk read_file_1.fastq.gz read_file_1.fastq.gz
The -j 8
option tells breseq to use 8 computational threads for multi-threaded parts of the analysis (like mapping reads). You can set it to the number of cores on your machine's processor, though going higher than ~8 doesn't give much of an improvement in speed b/c some parts of the analysis can still only use one processor.
The -l 80
option tells breseq to only examine enough reads that will give a nominal coverage of 80x at each position in the reference genome. It would be 80x if every single read mapped, but some won't, so usually it will be around 90-95% of this value. An average read-depth coverage of >40x is usually more than sufficient for calling mutations in clones. Using this option will greatly speed up the analysis if you have very large FASTQ files that might have 100x to 1000x or more of coverage. The extra reads don't help.
The breseq usage page in the manual describes additional options.
If necessary, copy the run/output
folder back to your local computer. You may want to tar/gzip it before transferring to speed things up.
Open the run/output/index.html
file in your favorite web browser.
The Output page in the manual describes what you are seeing.
Notice that there are "mutation predictions" in the table at the top and some "unassigned evidence" rows in tables lower down the page. Our goals for curation are to (1) eliminate the "unassigned evidence" predictions by converting them into fill "mutation predictions", (2) make sure that all of the mutation predictions are complete and entirely correct, and (3) look for some other evidence of mutations that breseq can't directly detect and incorporate that information.
The underlying information displayed in the HTML file is contained in the GenomeDiff output file generated by breseq. There's a copy of this at run/output/output.gd
. Open it in a text editor and examine it.
The GenomeDiff File Format page in the manual describes what you are seeing.
Next: Editing GenomeDiff Files
Quick Start
Installation
Test Drive
More Options
Usage: breseq
Usage: gdtools
More Information
GenomeDiff File Format
Reference Sequence File Formats
Output
Methods
Bibliography
FAQ
More Examples
Tutorial: Clones
Tutorial: Populations
Tutorial: Barcoded/Targeted
Tutorial: Curation
Contribute
Developer