-
Notifications
You must be signed in to change notification settings - Fork 23
soap_seq
soap_seq uses soap to match short sequences from the stream to a specified genome or sequence file in FASTA format. soap_seq allows for up to three mismatches in the mapping, but allows a maximum number of 1000 hits. Mathing is done progressively, so that if a tag is matches perfectly to a uniquely sequence, then matching is terminated with one hit. Alternatively, if no perfect matches are found, then matching with one mismatch is tried - only the first 1000 hits are reported, but only if there are zero matches soap tries matching with two mismatches.
This behaviour of soap is not verified !
Soap must be installed on your system in order for soap_seq to work. Read more here:
... | soap_seq [[options]] -i <FASTA file>
or
... | soap_seq [[options]] -g <genome>
[-? | --help] # Print full usage description.
[-i <file> | --in_file=<file>] # Path to FASTA file.
[-g <genome> | --genome=<genome>] # Choose genome instead of database.
[-s <uint> | --seed_size=<uint>] # Seed size - Default=10
[-m <uint>] | --mismatches=<uint>] # Number of mismatches allowed - Default=2
[-G <uint>] | --gap_size=<uint>] # Maximum gap sized allowed - Default=0
[-c <uint> | --cpus=<uint>] # Number of CPUs to use - Default=1
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To match short sequence in a FASTA file against a reference sequence in another FASTA file, do:
read_fasta -i <query FASTA file(s)> | soap_seq -i <reference FASTA file>
To match short sequences against a genome previously formatted with format_genome, do:
read_fasta -i <query FASTA file(s)> | soap_seq -g <genome>
To list avalible genomes use list_genomes.
Martin Asser Hansen - Copyright (C) - All rights reserved.
July 2008
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
soap_seq is part of the Biopieces framework.