2023-05-19: Namfinder is not for stable use yet. The program currently contains a limiting complexity in some cases (sqared in the number of hits) for genome size comparisons. I advice not to run this software until it is fixed. This repo went public just because uLTRA long transcriptomic aligner depends on it.
Namfinder is a sequence (DNA/RNA) mapping tool used to find Non-overlapping Approximate Matches (NAMs). The output and usage mimicks that of nucmer. You can think of NAMs as Maximal Exact Matches (MEMs) but allowing some SNVs and smaller indels. NAMs are constructed from overlapping strobemer seeds.
Namfinder has borrowed the whole indexing construction codebase from strobealign (a short-read mapper), but is used only for finding NAM seeds. Credits to @marcelm, @luispedro and @psj1997 for the optimized indexing implementation. Namfinder is a more optimized version of the previous proof-of-concept tool StrobeMap that was implemented for the strobemers paper. It has changed name not to confuse it with strobealign.
- Multithreading support
- Fast indexing (2-5 minutes for a human-sized reference genome)
- Output in MUMmer MEM tsv format
- Installation
- Usage
- Command-line options
- Index file
- Changelog
- Contributing
- Performance
- Credits
- Version info
- License
You need to have CMake, a recent g++
(tested with version 8) and zlib installed.
Then do the following:
git clone https://github.com/ksahlin/namfinder
cd namfinder
cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
make -j -C build
The resulting binary is build/namfinder
.
The binary is tailored to the CPU the compiler runs on.
If it needs to run on other machines, use this cmake
command instead for compatibility with most x86-64 CPUs in use today:
cmake -B build -DCMAKE_C_FLAGS="-msse4.2" -DCMAKE_CXX_FLAGS="-msse4.2"
Parameter -k
is the strobe size, -s
is sub-k-mer size (used for thinning in syncmers). Set -s
to the same value as k
for no thinning.
Parameters -l
and -u
are window min and window mac for sampling the downstream strobe. only strobemers of order 2 can currently be used.
namfinder -k 10 -s 10 -l 11 -u 35 -C 500 -o nams.tsv ref.fa reads.f[a/q]
- Some of the ideas for the index and NAM construction in namfinder was borrowed from: Sahlin, K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022). https://doi.org/10.1186/s13059-022-02831-7
- Big improvements were designed by @marcelm and @luispedro, and inplemented by @marcelm and @psj1997 (forthcoming paper).
MIT license, see LICENSE.