Skip to content
Martin Asser Hansen edited this page Oct 1, 2015 · 6 revisions

#summary Mask sequences in the stream based on quality scores.

Biopiece: mask_seq

Description

[mask_seq] masks sequences in the stream using either hard masking or soft masking (default). Hard masking is replacing residues with corresponding quality score below a specified cutoff with a N, while soft is replacing such residues with lower case. The sequences are values to SEQ keys and the quality scores are values to SCORES keys. The SCORES are encoded as ranges of ASCII characters from '@' to 'h' indicating scores from 0 to 40.

Read more here:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/

Usage

... | mask_seq [options]

Options

[-?          | --help]               #  Print full usage description.
[-c <int>    | --cutoff=<int>]       #  Cutoff used for soft masking low scoring sequence  -  Default=20
[-h          | --hardmask]           #  Hard mask instead of soft mask.
[-I <file!>  | --stream_in=<file!>]  #  Read input stream from file                        -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output stream to file                        -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

Consider the following FASTQ entry in the file test.fq:

@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGGGCGAT
+HWI-EAS157_20FFGAAXX:2:1:888:434
@ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX

We can read in these sequence using [read_fastq] and then soft mask the sequence with [mask_seq] like this:

read_fastq -i test.fq | mask_seq 

SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
SEQ: ttggtcgctcgctccgcgacCTCAGATCAGACGTGGGCGAT
SEQ_LEN: 41
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
---

Using the -c switch we can change the cutoff:

read_fastq -i test.fq | mask_seq -c 25

SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
SEQ: ttggtcgctcgctccgcgacctcaGATCAGACGTGGGCGAt
SEQ_LEN: 41
SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
---

Using the -h swich for hard masking:

read_fastq -i test.fq | mask_seq -h

SEQ_NAME: HWI-EAS157_20FFGAAXX:2:1:888:434
SEQ: NNNNNNNNNNNNNNNNNNNNCTCAGATCAGACGTGGGCGAT
SEQ_LEN: 41
SCORES: @ABCDEFGHIJKLMNOPQRSTUVWhgfedcba`_^]\[ZYX
---

See also

[read_fastq]

[scores_to_dec]

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

August 2010

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

[mask_seq] is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally