Corundum provides an implimentation of two different string distance algorithms, the Jaro-Winkler Distance Algorithm and the Levenshtein Distance Algorithm.
-
Add the dependency to your
shard.yml
:dependencies: cadmium_distance: github: cadmiumcr/distance
-
Run
shards install
require "cadmium_distance"
The Jaro-Winkler algorithm returns a number between 0 and 1 which tells how closely two strings match (1 being perfect and 0 being not at all).
jwd = Cadmium::Distance::JaroWinkler.new
jwd.distance("dixon","dicksonx")
# => 0.8133333333333332
jwd.distance("same","same")
# => 1
jwd.distance("not","same")
# => 0.0
The Levenshtein distance algorithm returns the number of edits (insertions, modifications, or deletions) required to transform one string into another.
Cadmium::Distance::Levenshtein.distance("doctor", "doktor")
# => 1
Cadmium::Distance::Levenshtein.distance("doctor", "doctor")
# => 0
Cadmium::Distance::Levenshtein.distance("flad", "flaten")
# => 3
Pair Distance uses arbitrary n-grams to calculate how similar one string is to another. By calculating the bi-grams for a string, the pair distance algorithm first checks how many occurrences of each bi-gram occur in both strings, then it calculates their similarity with the formula simularity = (2 · intersections) / (s1size + s2size)
.
Cadmium::Distance::Pair.distance("night", "nacht")
# => 0.25
- Fork it (https://github.com/cadmiumcr/distance/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
- Chris Watson - creator and maintainer