Skip to content

Commit

Permalink
Merge pull request #103 from jlancaster95/master
Browse files Browse the repository at this point in the history
added 25 tools & 2 benchmarks
  • Loading branch information
QGouil authored Nov 5, 2024
2 parents c437dc8 + d2936c8 commit d933581
Show file tree
Hide file tree
Showing 30 changed files with 6,677 additions and 3,160 deletions.
4 changes: 3 additions & 1 deletion benchmark_studies.csv
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,6 @@ Benchmarking of computational methods for m6A profiling with Nanopore direct RNA
Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data,"Michael B. Hall, Ryan R. Wick, Louise M. Judd, An N. T. Nguyen, Eike J. Steinig, Ouli Xie, Mark R. Davies, Torsten Seemann, Timothy P. Stinear, Lachlan J. M. Coin",2024,bioRxiv,NA,NA,"Variant calling is fundamental in bacterial genomics, underpinning the identification of disease transmission clusters, the construction of phylogenetic trees, and antimicrobial resistance prediction. This study presents a comprehensive benchmarking of SNP and indel variant calling accuracy across 14 diverse bacterial species using Oxford Nanopore Technologies (ONT) and Illumina sequencing. We generate gold standard reference genomes and project variations from closelyrelated strains onto them, creating biologically realistic distributions of SNPs and indels.Our results demonstrate that ONT variant calls from deep learning-based tools delivered higher SNP and indel accuracy than traditional methods and Illumina, with Clair3 providing the most accurate results overall. We investigate the causes of missed and false calls, highlighting the limitations inherent in short reads and discover that ONT�s traditional limitations with homopolymerinduced indel errors are absent with high-accuracy basecalling models and deep learning-based variant calls. Furthermore, our findings on the impact of read depth on variant calling offer valuable insights for sequencing projects with limited resources, showing that 10x depth is sufficient to achieve variant calls that match or exceed Illumina.In conclusion, our research highlights the superior accuracy of deep learning tools in SNP and indel detection with ONT sequencing, challenging the primacy of short-read sequencing. The reduction of systematic errors and the ability to attain high accuracy at lower read depths enhance the viability of ONT for widespread use in clinical and public health bacterial genomics.",https://doi.org/10.1101/2024.03.15.585313,Oxford Nanopore,SNPAndVariantAnalysis,"BCFtools, Clair3, DeppVariant, FreeBayes, Longshot, NanoCaller","Baceria WGS on nanopore with variant truthset, and complimentary illumina data",Long read data analysed with Clari3 or DeepVariant were evaluated as the best tools compared to others
Assembly Arena: Benchmarking RNA isoform reconstruction algorithms for nanopore sequencing,"Melanie Sagniez, Anshul Budhraja, Bastien Pare, Shawn M. Simpson, Clement Vinet-Ouellette, Marieke Rozendaal, Martin A. Smith",2024,bioRxiv,NA,NA,"Resolving the transcriptomes of higher eukaryotes is more tangible with the advent of long read sequencing, which greatly facilitates the identification of new transcripts and their splicing isoforms. However, the computational analysis of long read RNA sequencing data remains challenging as it is difficult to disentangle technical artifacts from bona fide biological information. To address this, we evaluated the performance of multiple leading transcriptome assembly algorithms on their ability to accurately reconstruct RNA transcript isoforms. We specifically focused on deep nanopore sequencing of synthetic RNA spike-in controls (Sequins� and SIRVs) across different chemistries, including cDNA and direct RNA protocols. Our systematic comparative benchmarking exposes the strengths and limitations of the different surveyed strategies. We also highlight conceptual and technical challenges with the annotation of transcriptomes and the formalization of assembly quality metrics. Our results complement similar recent endeavors, helping forge a path towards a gold standard analytical pipeline for long read transcriptome assembly.",https://doi.org/10.1101/2024.03.21.586080,Oxford Nanopore,IsoformDetection,"Bambu, FLAIR, FLAMES, IsONclust, IsONclust2, IsoQuant, Mandalorion, RATTLE, RNAbloom, RNAbloom2, Stringtie2, TALON, TALON_reco",Nanopore ligation & dRNA sequencing data of synthetic RNA spike-ins. Sequins & SIRV module 4,"They conclude there is no 'one-size fits-all' solution when it comes to transcriptome assembly, and that further development is required"
Analysis and benchmarking of small and large genomic variants across tandem repeats,"Adam C. English, Egor Dolzhenko, Helyaneh Ziaei Jam, Sean K. McKenzie, Nathan D. Olson, Wouter De Coster, Jonghun Park, Bida Gu, Justin Wagner, Michael A. Eberle, Melissa Gymrek, Mark J. P. Chaisson, Justin M. Zook & Fritz J. Sedlazeck",2024,Nature Biotechnology,NA,NA,"Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 �truth-set� TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies. ",https://doi.org/10.1038/s41587-024-02225-z,Oxford Nanopore,"EvaluatingExistingMethods, AnalysisPipelines","Rtg-tools, Truvari, HipSTR, GangSTR, Medaka, TRGT, DeepVariant, BioGraph, Sniffles","TR catalogue, HG002","They caution against the overinterpretation of their results as they are not representative of any particular pipeline�s limitations. Instead, the aim in these experiments was only to exemplify the use of the benchmark in hopes that developers of TR discovery pipelines may fully explore the relative strengths and weaknesses of their technologies."
Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioninformatic pipeline,"Sara Wattansombat, Siripong Tongjai",2024,f1000Research,13,556,"Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.",https://doi.org/10.12688/f1000research.149577.1,Oxford Nanopore,"EvaluatingExistingMethods, DenovoAssembly","Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, RVHaplo","Simulated nanopore data, HIV-1 genome mixtures","The assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection influences the size of contigs, with minimum read length of 2kb required for quality assembly. A 4kb read length improves quality further. Each assembler had their own advantages, they do not disclose which is best overall but suggest Strainline, MetaFlye and HaploDMF deliver successful assemblies on their simulated data."
Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioninformatic pipeline,"Sara Wattansombat, Siripong Tongjai",2024,f1000Research,13,556,"Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.",https://doi.org/10.12688/f1000research.149577.1,Oxford Nanopore,"EvaluatingExistingMethods, DenovoAssembly","Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, RVHaplo","Simulated nanopore data, HIV-1 genome mixtures","The assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection influences the size of contigs, with minimum read length of 2kb required for quality assembly. A 4kb read length improves quality further. Each assembler had their own advantages, they do not disclose which is best overall but suggest Strainline, MetaFlye and HaploDMF deliver successful assemblies on their simulated data."
Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology,"Lingchen Liu, Jia Zhang, Scott Wood, Felicity Newell, Conrad Leonard, Lambros T. Koufariotis, Katia Nones, Andrew J. Dalley, Haarika Chittoory, Farzad Bashirzadeh, Jung Hwa Son, Daniel Steinfort, Jonathan P. Williamson, Michael Bint, Carl Pahoff, Phan T. Nguyen, Scott Twaddell, David Arnold, Christopher Grainge, Peter T. Simpson, David Fielding, Nicola Waddel & John V. Pearson",2024,BMC Genomics,25,898,"Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes).",https://doi.org/10.1186/s12864-024-10792-3,Oxford Nanopore,"EvaluatingExistingMethods, SNPAndVariantAnalysis","minimap2, Winnowmap, NGMLR, SVIM, Sniffles2, DELLY, cuteSV, Severus, SAVANA, nanomonsv",Seven SCLC cell lines sequenced on Oxford Nanopore's PromethION,"Aligner choice had minimal impact on SV calling, but minimap2 was best due to speed. Somatic callers detect more high-confidence SV events compated to generic approaches "
Predicting RNA modifications by nanopore sequencing: The RMaP challenge,"Nicolo Alagna, Jannes Spangenberg, Stefan Mundnich, Anne Busch, Stefan Pastore, Anna Wierczeiko, Winfried Goettsch, Vincent Dietrich, Leszek Pryszcz, Sonia Cruciani, Eva Maria Novoa, Kanarp Joshi, Ranjan Perera, Salvatore Di Giorgio, Paola Arrubarrena, Irem Tellioglu, Chi-Lam Poon, Yuk Wan, Jonathan Goke, Andreas Hildebrand, Christoph Dieterich, Mark Helm, Manja Marz, Susanne Gerber",2024,Nature Portfolio,NA,NA,"The field of epitranscriptomics is undergoing a technology-driven revolution. During past decades, RNA modifications like N6-methyladenosine (m6A), pseudouridine (_), and 5-methylcytosine (m5C) became acknowledged for playing critical roles in gene expression regulation, RNA stability, and translation efficiency. Among modification-aware sequencing approaches, direct RNA sequencing by Oxford Nanopore Technologies (ONT) enabled the detection of modifications in native RNA, by capturing and storing properties of noncanonical RNA nucleosides in raw data. Consequently, the field's cutting edge has a heavy component in computer science, opening new avenues of cooperation across the community, as exchanging data is as impactful as exchanging samples. Therefore, we seize the occasion to bring scientists together within the RMaP challenge to advance solutions for RNA modification detection and discuss current ideas, problems and approaches. Here, we show several computational methods to detect the most researched mRNA modifications (m6A, _, and m5C). Results demonstrate that a low prediction error and a high prediction accuracy can be achieved on these modifications across different approaches and algorithms. The RMaP challenge marks a substantial step towards improving algorithms' comparability, reliability, and consistency in RNA modification prediction. It points out the deficits in this young field that need to be addressed in further challenges.",https://doi.org/10.21203/rs.3.rs-5241143/v1,Oxford Nanopore,"EvaluatingExistingMethods, BaseModificationDetection","guppy, dorado, tombo, bayespore, CHEUI, PseudoDec, scikit-learn, m6anet",synthetic ONT datasets,"Methods using bayespore and CHEUI performed best overall. However, the datasets were produced using old direct-RNA chemistry and are now obsolete"
28 changes: 28 additions & 0 deletions docs/data/benchmarks.json
Original file line number Diff line number Diff line change
Expand Up @@ -317,5 +317,33 @@
"ToolsCompared": "Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, RVHaplo",
"BenchmarkData": "Simulated nanopore data, HIV-1 genome mixtures",
"Recommendations": "The assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection influences the size of contigs, with minimum read length of 2kb required for quality assembly. A 4kb read length improves quality further. Each assembler had their own advantages, they do not disclose which is best overall but suggest Strainline, MetaFlye and HaploDMF deliver successful assemblies on their simulated data."
},
{
"Title": "Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology",
"Authors": "Lingchen Liu, Jia Zhang, Scott Wood, Felicity Newell, Conrad Leonard, Lambros T. Koufariotis, Katia Nones, Andrew J. Dalley, Haarika Chittoory, Farzad Bashirzadeh, Jung Hwa Son, Daniel Steinfort, Jonathan P. Williamson, Michael Bint, Carl Pahoff, Phan T. Nguyen, Scott Twaddell, David Arnold, Christopher Grainge, Peter T. Simpson, David Fielding, Nicola Waddel & John V. Pearson",
"Year": "2024",
"Journal": "BMC Genomics",
"Issue": "25",
"Volume": "898",
"Abstract": "Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes).",
"doi": "https://doi.org/10.1186/s12864-024-10792-3",
"Technology": "Oxford Nanopore",
"Categories": "EvaluatingExistingMethods, SNPAndVariantAnalysis",
"ToolsCompared": "minimap2, Winnowmap, NGMLR, SVIM, Sniffles2, DELLY, cuteSV, Severus, SAVANA, nanomonsv",
"BenchmarkData": "Seven SCLC cell lines sequenced on Oxford Nanopore's PromethION",
"Recommendations": "Aligner choice had minimal impact on SV calling, but minimap2 was best due to speed. Somatic callers detect more high-confidence SV events compated to generic approaches"
},
{
"Title": "Predicting RNA modifications by nanopore sequencing: The RMaP challenge",
"Authors": "Nicolo Alagna, Jannes Spangenberg, Stefan Mundnich, Anne Busch, Stefan Pastore, Anna Wierczeiko, Winfried Goettsch, Vincent Dietrich, Leszek Pryszcz, Sonia Cruciani, Eva Maria Novoa, Kanarp Joshi, Ranjan Perera, Salvatore Di Giorgio, Paola Arrubarrena, Irem Tellioglu, Chi-Lam Poon, Yuk Wan, Jonathan Goke, Andreas Hildebrand, Christoph Dieterich, Mark Helm, Manja Marz, Susanne Gerber",
"Year": "2024",
"Journal": "Nature Portfolio",
"Abstract": "The field of epitranscriptomics is undergoing a technology-driven revolution. During past decades, RNA modifications like N6-methyladenosine (m6A), pseudouridine (_), and 5-methylcytosine (m5C) became acknowledged for playing critical roles in gene expression regulation, RNA stability, and translation efficiency. Among modification-aware sequencing approaches, direct RNA sequencing by Oxford Nanopore Technologies (ONT) enabled the detection of modifications in native RNA, by capturing and storing properties of noncanonical RNA nucleosides in raw data. Consequently, the field's cutting edge has a heavy component in computer science, opening new avenues of cooperation across the community, as exchanging data is as impactful as exchanging samples. Therefore, we seize the occasion to bring scientists together within the RMaP challenge to advance solutions for RNA modification detection and discuss current ideas, problems and approaches. Here, we show several computational methods to detect the most researched mRNA modifications (m6A, _, and m5C). Results demonstrate that a low prediction error and a high prediction accuracy can be achieved on these modifications across different approaches and algorithms. The RMaP challenge marks a substantial step towards improving algorithms' comparability, reliability, and consistency in RNA modification prediction. It points out the deficits in this young field that need to be addressed in further challenges.",
"doi": "https://doi.org/10.21203/rs.3.rs-5241143/v1",
"Technology": "Oxford Nanopore",
"Categories": "EvaluatingExistingMethods, BaseModificationDetection",
"ToolsCompared": "guppy, dorado, tombo, bayespore, CHEUI, PseudoDec, scikit-learn, m6anet",
"BenchmarkData": "synthetic ONT datasets",
"Recommendations": "Methods using bayespore and CHEUI performed best overall. However, the datasets were produced using old direct-RNA chemistry and are now obsolete"
}
]
Loading

0 comments on commit d933581

Please sign in to comment.