sylph - fast and precise species-level metagenomic profiling with ANIs

Introduction

sylph is a program that performs ultrafast (1) ANI querying or (2) metagenomic profiling for metagenomic shotgun samples.

Containment ANI querying: sylph can search a genome, e.g. E. coli, against your sample. If sylph outputs an estimate of 97% ANI, your sample contains an E. coli with 97% ANI to the queried genome.

Metagenomic profiling: sylph can determine the species/taxa in your sample and their abundances, just like Kraken or MetaPhlAn.

Profiling 1 Gbp of mouse gut reads against 85,205 genomes in a few seconds

Why sylph?

Precise species-level profiling: Our tests show that sylph has less false positives than Kraken and is about as precise and sensitive as marker gene methods (MetaPhlAn, mOTUs).
Ultrafast, multithreaded, multi-sample: sylph can be > 50x faster than other methods for multi-sample processing. sylph only takes ~15GB of RAM for profiling against the entire GTDB-R220 database (110k genomes).
Accurate (containment) ANI information: Sylph can often give accurate ANI estimates between reference genomes and your metagenome sample down to 0.1x coverage.
Customizable databases and pre-built databases: We offer pre-built databases of prokaryotes, viruses, eukaryotes. Custom databases (e.g. using your own MAGs) are easy to build. Taxonomic information can be incorporated downstream for traditional profiling reports.
Short or long reads: Sylph was primarily benchmarked against short reads, but sylph was also the most accurate method on Oxford Nanopore's independent benchmarks.

How does sylph work?

sylph uses a k-mer containment method. sylph's novelty lies in using a statistical technique to correct ANI for low coverage genomes , giving accurate results for low abundance genomes. See here for more information on what sylph can and can not do.

Very quick start

Profile metagenome sample against GTDB-R220 (113,104 bacterial/archaeal species representative genomes)

conda install -c bioconda sylph

# download GTDB-R220 pre-built database (~13 GB)
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r220-c200-dbv1.syldb

# multi-sample paired-end profiling (sylph version >= 0.6)
sylph profile gtdb-r220-c200-dbv1.syldb -1 *_1.fastq.gz -2 *_2.fastq.gz -t (threads) > profiling.tsv

# multi-sample single-end profiling
sylph profile gtdb-r220-c200-dbv1.syldb *.fastq -t (threads) > profiling.tsv

Install

Option 1: conda install

conda install -c bioconda sylph

Warning

conda install may break if AVX2 instructions are not available on your CPU. See the issue here. The binary and source install still work.

Option 2: Build from source

Requirements:

rust (version > 1.63) programming language and associated tools such as cargo are required and assumed to be in PATH.
A c compiler (e.g. GCC)
make
cmake

Building takes a few minutes (depending on # of cores).

git clone https://github.com/bluenote-1577/sylph
cd sylph

# If default rust install directory is ~/.cargo
cargo install --path . --root ~/.cargo
sylph profile test_files/*

Option 3: Pre-built x86-64 linux statically compiled executable

If you're on an x86-64 system, you can download the binary and use it without any installation.

wget https://github.com/bluenote-1577/sylph/releases/download/latest/sylph
chmod +x sylph
./sylph -h

Note: the binary is compiled with a different set of libraries (musl instead of glibc), probably impacting performance.

Tutorials, manuals, and pre-built databases

Pre-built databases

The pre-built databases available here can be downloaded and used with sylph for profiling and containment querying.

Cookbook

For common use cases and fast explanations, see the above cookbook.

Tutorials

Manuals

sylph-utils

For incorporating taxonomy and manipulating output formats, see the sylph-utils repository.

Changelog

Version v0.7.0 - 2024-11-06.

Added the inspect option to inspect .syldb/.sylsp files.

See the CHANGELOG for complete details.

Citing sylph

Jim Shaw and Yun William Yu. Rapid species-level metagenome profiling and containment estimation with sylph (2024). Nature Biotechnology.

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
.cargo		.cargo
.github/workflows		.github/workflows
assets		assets
src		src
test_files		test_files
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sylph - fast and precise species-level metagenomic profiling with ANIs

Introduction

Why sylph?

How does sylph work?

Very quick start

Profile metagenome sample against GTDB-R220 (113,104 bacterial/archaeal species representative genomes)

Install

Option 1: conda install

Option 2: Build from source

Option 3: Pre-built x86-64 linux statically compiled executable

Tutorials, manuals, and pre-built databases

Pre-built databases

Cookbook

Tutorials

Introduction: 5-minute sylph tutorial outlining basic usage

Taxonomic profiling against GTDB database with MetaPhlAn-like output format

Manuals

Output format (TSV) and containment ANI explanation

Incoporating custom taxonomies to get CAMI-like or MetaPhlAn-like outputs

sylph-utils

Changelog

Version v0.7.0 - 2024-11-06.

Citing sylph

About

Releases 13

Packages

Contributors 2

Languages

License

bluenote-1577/sylph

Folders and files

Latest commit

History

Repository files navigation

sylph - fast and precise species-level metagenomic profiling with ANIs

Introduction

Why sylph?

How does sylph work?

Very quick start

Profile metagenome sample against GTDB-R220 (113,104 bacterial/archaeal species representative genomes)

Install

Option 1: conda install

Option 2: Build from source

Option 3: Pre-built x86-64 linux statically compiled executable

Tutorials, manuals, and pre-built databases

Tutorials

Manuals

Changelog

Version v0.7.0 - 2024-11-06.

Citing sylph

About

Topics

Resources

License

Stars

Watchers

Forks

Languages