-
Notifications
You must be signed in to change notification settings - Fork 23
kmer_freq
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
kmer_freq can be used to determine the frequencies of k-mers of sequences in the stream. The resulting records look like this:
REC_TYPE: KMER_FREQ
KMER: GTAG
COUNT: 1
FREQ: 0.0204
---
... | kmer_freq [options]
[-? | --help] # Print full usage description.
[-s <uint> | --size=<uint>] # K-mer size - Default=4
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entry in the file test.fna
:
>test
ATGCACATTGATGCACATTGATGCACATTGATGCACATTGATGCACATTG
To find k-mer frequencies read in the sequence with read_fasta:
read_fasta -i test.fna | kmer_freq -s 8
SEQ_NAME: test
SEQ: ATGCACATTGATGCACATTGATGCACATTGATGCACATTGATGCACATTG
SEQ_LEN: 50
---
REC_TYPE: KMER_FREQ
KMER: ATGCACAT
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: TGCACATT
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: GCACATTG
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: CACATTGA
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: ACATTGAT
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: CATTGATG
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: ATTGATGC
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: TTGATGCA
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: TGATGCAC
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: GATGCACA
COUNT: 4
FREQ: 0.4
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
January 2011
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
kmer_freq is part of the Biopieces framework.