-
Notifications
You must be signed in to change notification settings - Fork 23
smudgeplot extract
A script that reads a file with kmer pair sequences and a file with their corresponding coverages and prints on the standard output all the kmers that follow the user specification in a fasta format. It can be thought as extraction of kmer pairs that fall in a user-defined rectangle within the smudgeplot.
For example, to extract the core kmer pairs of the AAB smudge in the smudgeplot in the README file, you could run this module with specified parameters -minc 500 -maxc 700 -minr 0.3 -maxr 0.367
and it would sub-select kmers falling in the following rectange
The header of each kmer has a follwing format >kmer_<INDEX>_<1/2>_<COV>
; the <INDEX>
is 0-based order of the kmer in the kmer pair file; <1/2>
is corresponding to the two kmers in the pair (1/2 correspond to the one with the smaller/higher coverage) and cov
is the frequency of the kmer in the original read set.
Look at wikipage about mapping of these kmers using bwa.
usage: smudgeplot extract [-h] -cov COVERAGEFILE -seq SEQFILE -minc COUNTMIN
-maxc COUNTMAX -minr RATIOMIN -maxr RATIOMAX > extracted_kmer_pairs.fasta
Extract kmer pairs within specified coverage sum and minor covrage ratio
ranges.
optional arguments:
-h, --help show this help message and exit
-cov COVERAGEFILE, --coverageFile COVERAGEFILE
coverage file for the kmer pairs
-seq SEQFILE, --seqFile SEQFILE
sequences of the kmer pairs
-minc COUNTMIN, --countMin COUNTMIN
lower bound of the summed coverage
-maxc COUNTMAX, --countMax COUNTMAX
upper bound of the summed coverage
-minr RATIOMIN, --ratioMin RATIOMIN
lower bound of minor allele ratio
-maxr RATIOMAX, --ratioMax RATIOMAX
upper bound of minor allele ratio