Skip to content

Tutorial Curation Running breseq

Jeffrey Barrick edited this page Jun 12, 2024 · 8 revisions

Back to the Main Curation Tutorial Page

Install breseq

Follow the breseq installation instructions. Most likely, you'll want to install breseq in its own Conda environment.

Download tutorial data

Make a new directory for this analysis and change into it:

mkdir breseq_run
cd breseq_run

Download the Illumina sequencing read files (FASTQ format): (Use the private Box link until data is public)

Download the annotated reference genome (GenBank format) for the LTEE ancestor strain REL606: REL606.gbk

You can copy this to your current working directory using wget:

wget https://raw.githubusercontent.com/barricklab/LTEE/7da91974eafac0c5a8f903ae57275795d4395af2/reference/REL606.gbk

Run breseq

Run the breseq command:

breseq -j 8 -l 80 -o output -r REL606.gbk read_file_1.fastq.gz read_file_1.fastq.gz

The -j 8 option tells breseq to use 8 computational threads for multi-threaded parts of the analysis (like mapping reads). You can set it to the number of cores on your machine's processor, though going higher than ~8 doesn't give much of an improvement in speed b/c some parts of the analysis can still only use one processor.

The -l 80 option tells breseq to only examine enough reads that will give a nominal coverage of 80x at each position in the reference genome. It would be 80x if every single read mapped, but some won't, so usually it will be around 90-95% of this value. An average read-depth coverage of >40x is usually more than sufficient for calling mutations in clones. Using this option will greatly speed up the analysis if you have very large FASTQ files that might have 100x to 1000x or more of coverage. The extra reads don't help.

The breseq usage page in the manual describes additional options.

Open the HTML output

If necessary, copy the run/output folder back to your local computer. You may want to tar/gzip it before transferring to speed things up.

Open the run/output/index.html file in your favorite web browser.

The Output page in the manual describes what you are seeing.

Notice that there are "mutation predictions" in the table at the top and some "unassigned evidence" rows in tables lower down the page. Our goals for curation are to (1) eliminate the "unassigned evidence" predictions by converting them into fill "mutation predictions", (2) make sure that all of the mutation predictions are complete and entirely correct, and (3) look for some other evidence of mutations that breseq can't directly detect and incorporate that information.

Open the GenomeDiff output

The underlying information displayed in the HTML file is contained in the GenomeDiff output file generated by breseq. There's a copy of this at run/output/output.gd. Open it in a text editor and examine it.

The GenomeDiff File Format page in the manual describes what you are seeing.

Next: Editing GenomeDiff Files