README - Spell Check Project
./src/test-script.py run-test
./src/test-script.py calc-stats
To view statistics of all previous runs, see ./data/phrases.stats (or words.stats or sentences.stats as the case may be)
Out of words within edit distance of two, top K words are chosen based on weighted edit distance.
A simple cartesian product of all the candidate words suggested for each word in the query make up the candidate suggested phrase.
Microsoft N gram service results are used as prior for each candidate phrase.
Assuming the probability exponentially decreases as the number of edits required increases, likelihood is calculated as e−(weighted edit distance/length)