from a Science issue containing Levene et al. 2003.
PacBio was the second (after Helicos) single molecule sequencing (SMS) technology on the market. Today it is an important technology enabling assembly of complex genomes and transcriptomes.
The era of Pacific Biosciences begins with a publication by Eid et al. 2009. Yet as was the case with other technologies it did not appear out of thin air. One of the key publications (written by some of the same authors) predating the birth of PacBio is the one by Levene et al. 2003. In particular it had the following figure:
Figure 1 from Levene et al. 2003. An apparatus for single-molecule enzymology using zero-mode waveguides.
The overall idea of this device is that it can detect fluorescent ligands (such as, for example, labeled dNTPs) in a very small volume: 10-21 litre. Specifically, envision a glass slide fused to a metal (Aluminum) film with tiny holes. In fact, the diameter of the holes is smaller than the wavelength of light that is used to illuminate the slide. As a result only molecules at the bottom of this "hole" will be detectable. This allows to track single molecule dynamics. Now imagine that you put a single DNA polymerase molecule at the bottom of such a "hole". The polymerase will be pulling dNTPs close to the bottom of the "hole" at it performs polymerization reaction. Thus, only nucleotides that are being added to DNA strand will be detected by the device at any given time. It now make a movie of this process you are recording real time polymerization kinetics. And this is exactly what PacBio process does:
Figure 1 from Eid et al. 2009
Because the "movie" recorded by the machine contains temporal component of the process (how long does it take for each base to be incorporated to the nascent chain) this information can be directly used to identify modified bases in the template as was shown by Flushberg et al. 2010:
Figure 1 from Flushberg et al. 2010.
One of the major drawbacks of SNS technologies is a relatively high error. In the case of PacBio it is somewhere between 10 to 25%. However, because PacBio uses library produced by ligating bell-shaped adapters to DNA molecules, a single circular molecule can be sequenced multiple times allowing for error correction:
Figure 1a from Wenger et al. 2019.
Performing multiple passes allows for high accuracy:
Figure 2b from Wenger et al. 2019.
with reads of approximately 13kb:
Figure 1c from Wenger et al. 2019.
Note
Note that it is also possible to read a longer insert just generating what is called Continuous Long Reads (CLRs). These are obviously much longer but are less accurate. Thus current PacBio systems produce two types of reads: Circular Consensus Reads (CCR) and Continuous Long Reads (CLRs). A subset of high quality CCR reads (with base
In addition to sequencing data and base qualities PacBio machines produce kinetics data. In particular:
- Intra Pulse Duration values (IDPs)
- Pulse width values
Older machines were reporting all this info as Hierarchical Data Format (HDF). Newer machines are using a special subset of BAM format to report this information. Because of all these additional data PacBio datasets tend to be quite large:
You can look at a small fragment of the *.hifi_reads.bam
file in its textual (SAM) representation here.
Kinetogram of PacBio read (from here).
Recently PacBio has added an interesting short read Sequencing-by-binding (SBB) technology to its portfolio by bying Omniome. This technology (only remotely related to PacBio) is based on the following principles:
Figure 2 from Cetin et al. 2018.
Figure 3 from Cetin et al. 2018.
The machine, Onso system will allow up to 200bp reads with the possibility of paired-end sequencing. I have not yet seen the data.
A brief summary of other sequecning technologies worth mentioning.