Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 6 revisions

Biopiece: kmer_freq

Description

kmer_freq can be used to determine the frequencies of k-mers of sequences in the stream. The resulting records look like this:

REC_TYPE: KMER_FREQ
KMER: GTAG
COUNT: 1
FREQ: 0.0204
---

Usage

... | kmer_freq [[options]]

Options

[-?          | --help]               #  Print full usage description.
[-s <uint>   | --size=<uint>]        #  K-mer size                      -  Default=4
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file     -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file     -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTA entry in the file test.fna:

>test
ATGCACATTGATGCACATTGATGCACATTGATGCACATTGATGCACATTG

To find k-mer frequencies read in the sequence with read_fasta:

read_fasta -i test.fna | kmer_freq -s 8
SEQ_NAME: test
SEQ: ATGCACATTGATGCACATTGATGCACATTGATGCACATTGATGCACATTG
SEQ_LEN: 50
---
REC_TYPE: KMER_FREQ
KMER: ATGCACAT
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: TGCACATT
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: GCACATTG
COUNT: 5
FREQ: 0.5
---
REC_TYPE: KMER_FREQ
KMER: CACATTGA
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: ACATTGAT
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: CATTGATG
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: ATTGATGC
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: TTGATGCA
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: TGATGCAC
COUNT: 4
FREQ: 0.4
---
REC_TYPE: KMER_FREQ
KMER: GATGCACA
COUNT: 4
FREQ: 0.4
---

See also

read_fasta

split_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

January 2011

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

kmer_freq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally