FALCON is an ultra-fast method to infer metagenomic composition of sequenced reads. FALCON measures similarity between any FASTQ file (or FASTA), independently from the size, against any multi-FASTA database, such as the entire set of complete genomes from the NCBI. FALCON supports single reads, paired-end reads, and compositions of both. It has been tested in many plataforms, such as Illumina MySeq, HiSeq, Novaseq, IonTorrent.
FALCON is efficient to detect the presence and authenticate a given species in the FASTQ reads. The core of the method is based on relative data compression. FALCON uses variable multi-threading, without multiplying the memory for each thread, being able to run efficiently in a common laptop.
The tool is also able to identify locally where, in each reference sequence, the similarity occur. FALCON provides programs to filter the local results (FALCON-filter) and to visualize the results (FALCON-filter-visual). Also for database inter-similarity analysis (FALCON-inter) and respective visualization (FALCON-inter-visual).
Find additional information here.
1.1 Automatic installation with Conda
conda install -c cobilab falcon --yes
git clone https://github.com/cobilab/falcon.git
cd falcon/src/
cmake .
make
cp FALCON ../../
cp FALCON-filter ../../
cp FALCON-filter-visual ../../
cp FALCON-inter ../../
cp FALCON-inter-visual ../../
cd ../../
Cmake is needed for installation.
Search for the top 15 similar viruses in sample reads that we provide in folder test:
cd test
gunzip reads.fq.gz
gunzip VDB.fa.gz
./FALCON -v -F -t 15 -l 47 -x top.txt reads.fq VDB.fa
It will identify Zaire Ebolavirus in the samples (top.txt) according to the following image
An example of building a reference database from NCBI:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/assembly_summary.txt
awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary.txt > ASCG.txt
mkdir -p GB_DB_VIRAL
mkdir -p GB_DB_VIRAL_CDS
mkdir -p GB_DB_VIRAL_RNA
cat ASCG.txt | xargs -I{} -n1 -P8 wget -P GB_DB_VIRAL {}/*_genomic.fna.gz
mv GB_DB_VIRAL/*_cds_from_genomic.fna.gz GB_DB_VIRAL_CDS/
mv GB_DB_VIRAL/*_rna_from_genomic.fna.gz GB_DB_VIRAL_RNA/
zcat GB_DB_VIRAL/*.fna.g > VDB.fa
For building reference databases for multiple domains/kingdoms (bacterial, fungi, protozoa, plant, etc), use:
https://raw.githubusercontent.com/cobilab/gto/master/scripts/gto_build_dbs.sh
An already reference viral database is available here. With this example, you only need to uncompress it, namely through: gunzip VDB.fa.gz, and use it in FALCON along with the FASTQ reads.
The FALCON package includes the following binaries:
- FALCON: metagenomic composition analysis;
- FALCON-filter: local interations - localization;
- FALCON-filter-visual: visualization of global and local similarities;
- FALCON-inter: inter-similarity between database genomes;
- FALCON-inter-visual: visualization of inter-similarities.
To see the possible options of FALCON type
./FALCON
or
./FALCON -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-Z database local similarity,
-s show compression levels,
-l <level> compression level [1;47],
-p <sample> subsampling (default: 1),
-t <top> top of similarity (default: 20),
-n <nThreads> number of threads (default: 2),
-x <FILE> similarity top filename,
-y <FILE> local similarities filename,
Mandatory arguments:
[FILE1]:[FILE2]:... metagenomic filename (FASTQ),
Use ":" for splitting files.
[FILE] database filename (Multi-FASTA).
For local interactions detection and visualization the FALCON package provides FALCON-filter and FALCON-filter-visual.
To see the possible options of FALCON-filter type
./FALCON-filter
or
./FALCON-filter -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-s <size> filter window size,
-w <type> filter window type,
-x <sampling> filter window sampling,
-sl <lower> similarity lower bound,
-su <upper> similarity upper bound,
-dl <lower> size lower bound,
-du <upper> size upper bound,
-t <threshold> threshold [0;2.0],
-o <FILE> output filename,
Mandatory arguments:
[FILE] profile filename (from FALCON).
To see the possible options of FALCON-filter-visual type
./FALCON-filter-visual
or
./FALCON-filter-visual -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-F force mode (overwrites top file),
-V display version number,
-v verbose mode (more information),
-w <width> square width (for each value),
-s <ispace> square inter-space (between each value),
-i <indexs> color index start,
-r <indexr> color index rotations,
-u <hue> color hue,
-sl <lower> similarity lower bound,
-su <upper> similarity upper bound,
-dl <lower> size lower bound,
-du <upper> size upper bound,
-bg show only the best of group,
-g <color> color gamma,
-e <size> enlarge painted regions,
-ss do NOT show global scale,
-sn do NOT show names,
-o <FILE> output image filename,
Mandatory arguments:
[FILE] profile filename (from FALCON-filter).
To see the possible options of FALCON-inter type
./FALCON-inter
or
./FALCON-inter -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-V display version number,
-v verbose mode (more information),
-s how compression levels,
-l <level> compression level [1;30],
-n <nThreads> number of threads,
-x <FILE> similarity matrix filename,
-o <FILE> labels filename,
Mandatory arguments:
<FILE>:<FILE>:<...> input files (last arguments).
Use ":" for file splitting.
To see the possible options of FALCON-inter type
./FALCON-inter-visual
or
./FALCON-inter-visual -h
These will print the following options:
Non-mandatory arguments:
-h give this help,
-V display version number,
-v verbose mode (more information),
-w square width (for each value),
-a square inter-space (between each value),
-s index color start,
-r index color rotations,
-u color hue,
-g color gamma,
-l <FILE> labels filename,
-x <FILE> heatmap filename,
Mandatory arguments:
<FILE> input matrix file (from FALCON-inter).
Find additional information here.
Create the following bash script:
#!/bin/bash
./FALCON -v -n 4 -t 200 -F -Z -l 47 -c 20 -y complexity.com $1 $2
./FALCON-filter -v -F -t 0.5 -o positions.pos complexity.com
./FALCON-filter-visual -v -F -o draw.map positions.pos
Name it FALCON-meta.sh and give run access
chmod +x FALCON-meta.sh
The, run FALCON:
./FALCON-meta.sh reads1.fastq:reads2.fastq VDB.fa
reads1.fastq, reads2.fastq and VDB.fa is only an example. For more, see folder examples.
On using this software/method please cite:
D. Pratas, M. Hosseini, G. Grilo, A. J. Pinho, R. M. Silva, T. Caetano, J. Carneiro, F. Pereira. (2018).
Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard.
Genes, 9(9), 445.
Bibtex:
@article{Pratas-2018a,
title={Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard},
author={D. Pratas, M. Hosseini, G. Grilo, A. J. Pinho, R. M. Silva, T. Caetano, J. Carneiro, F. Pereira},
journal={Genes},
volume={9},
number={9},
pages={445},
year={2018}
}
For any issue let us know at issues link.
GPL v3.
For more information see LICENSE file or visit
http://www.gnu.org/licenses/gpl-3.0.html