Skip to content

Commit

Permalink
documentation hotfixes
Browse files Browse the repository at this point in the history
  • Loading branch information
KamilSJaron committed May 14, 2019
1 parent 1f4206a commit 03244a5
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 13 deletions.
13 changes: 2 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,18 +99,9 @@ This script estimates the size, heterozygosity, and repetitive fraction of the g

## Frequently Asked Questions

Are collected on [our wiki](https://github.com/KamilSJaron/smudgeplot/wiki/FAQ). If you won't find an answer for your question there, open an [issue](https://github.com/KamilSJaron/smudgeplot/issues/new/choose) or drop us an email.
Are collected on [our wiki](https://github.com/KamilSJaron/smudgeplot/wiki/FAQ). Smudgeplot is not much demanding on computational resources, but make sure you check [memory requirements](https://github.com/KamilSJaron/smudgeplot/wiki/smudgeplot-hetkmers#memory-requirements) before you extract kmer pairs (`hetkmers` task). If you won't find an answer for your question in FAQ, open an [issue](https://github.com/KamilSJaron/smudgeplot/issues/new/choose) or drop us an email.

Check [projects](https://github.com/KamilSJaron/smudgeplot/projects) to see how the development goes

## Computational requirements

The memory required scale linearly with the number of kmers and it is approximately 15x higher than the size of the dump file
(for a 20Gb dump file you will need approx ~250Gb of RAM). Alternatively, you can estimate the RAM requirement by number of dumped kmers. It's approximately 350x higher than number of kmers in the dump file. If your file has too many kmers you can decrease computational requirement by rerunning the kmer spectra with a smaller k (i.e. kmer size) or by more strict filtering of the dumped kmers (higher L and smaller U).

We have not calculated the complexity of the algorithm yet. Usually for smaller genomes (<250Gb) it's couple of hours, the longest computation took bit more than one day.

The biggest genome we analyzed so far was a triplod genome with a haploid size 3.5Gbp. We have processed 1.5e9 genomic kmers and it have required 520GB of memory and two days of computation on eight cores.
Check [projects](https://github.com/KamilSJaron/smudgeplot/projects) to see how the development goes.

## Contributions

Expand Down
4 changes: 2 additions & 2 deletions exec/smudgeplot
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ tasks: cutoff Calculate meaningful values for lower/upper kmer histogram cuto
argparser.add_argument('infile', nargs='?', type=argparse.FileType('r'), default=sys.stdin, help='Alphabetically sorted Jellyfish or KMC dump file (stdin).')
argparser.add_argument('-o', help='The pattern used to name the output (kmerpairs).', default='kmerpairs')
argparser.add_argument('--middle', dest='middle', action='store_const', const = True, default = False,
help='Get all kmer pairs one SNP away from each other (default: just the middle one).')
help='Get all kmer pairs that are exactly the same but in the middle nt. When this flag is used, the input dump must be alphabetically sorted/ (default: different by a SNP at any position).')
self.arguments = argparser.parse_args(sys.argv[2:])

def plot(self):
Expand All @@ -81,7 +81,7 @@ tasks: cutoff Calculate meaningful values for lower/upper kmer histogram cuto
'''
argparser = argparse.ArgumentParser(prog = 'smudgeplot cutoff', description='Calculate meaningful values for lower/upper kmer histogram cutoff.')
argparser.add_argument('infile', type=argparse.FileType('r'), help='Name of the input kmer histogram file (default \"kmer.hist\")."')
argparser.add_argument('boundary', help='Which bounary to compute L (lower, default) or U (upper)', default = 'L')
argparser.add_argument('boundary', help='Which bounary to compute L (lower) or U (upper)')
self.arguments = argparser.parse_args(sys.argv[2:])

###############
Expand Down

0 comments on commit 03244a5

Please sign in to comment.