A small Java package for simple string matching. It was born when I needed something similar to The Fuzz, but for Java.
In my testing, the results it produces are extremely similar to results produced by The Fuzz (not surprising since both are based on the same underlying concepts). It also performed extremely well compared to the matcher built into a popular CAT software package.
- 100% Java;
- does not have any dependency outside of the Java Class Library;
- fully compatible with Java 11 and up.
The source code is extensively documented using JavaDoc, but here is a quick recap.
private static int levenshtein(String s1, String s2)
Returns the Levenshtein distance between strings s1
and s2
.
private static int levenshteinToken(String[] tokens1, String[] tokens2)
Similar to levenshtein()
, but based on tokens instead of individual characters. A token is an uninterrupted sequence of word characters (\w
in regex parlance).
public static int ratio(String s1, String s2, boolean ignoreCase)
Returns similarity between s1
and s2
based on their Levenshtein distance. If ignoreCase
is true
, the comparison will be case-insensitive.
public static int ratioToken(String s1, String s2, boolean ignoreCase)
Returns similarity between s1
and s2
based on levenshteinToken()
. Returns -1 if both s1
and s2
do not contain any tokens.
public static int ratioTokenSet(String s1, String s2, boolean ignoreCase)
This method treats s1
and s2
as sets of tokens and returns their similarity based on the difference between those two sets. Returns -1 if both s1
and s2
do not contain any tokens.
Example:
int score = FuzzyStrings.ratioTokenSet("It is a beautiful day", "A beautiful day it is", true);
System.out.println(score); // 100
public static int complexRatio(String s1, String s2, boolean ignoreCase)
Returns similarity between s1
and s2
based on ratio()
, ratioToken()
and ratioTokenSet()
.
public static StringMatch matchOne(String s, Collection<String> candidates,
StringCompareFunction compareFunction, boolean ignoreCase)
Compares all strings in candidates
to s
and returns the best match.
FuzzyStrings::ratio
, FuzzyStrings::ratioToken
, FuzzyStrings::ratioTokenSet
and FuzzyStrings::complexRatio
can be used as compareFunction
.
Example:
String s = "this is a test";
List<String> candidates = new ArrayList<>();
candidates.add("this is a pest");
candidates.add("not a match at all");
candidates.add("this is a bird's nest");
candidates.add("this is a test test");
candidates.add("this is a test!");
candidates.add("this is a Test");
candidates.add("this is a quick test");
StringMatch m = FuzzyStrings.matchOne(s, candidates, FuzzyStrings::complexRatio, true);
System.out.println(m.getScore() + ": " + m.getText()); // 100: this is a Test
public static List<StringMatch> matchAndSort(String s, Collection<String> candidates,
StringCompareFunction compareFunction, boolean ignoreCase)
Compares all strings in candidates
to s
and returns matches sorted by score (in descending order).
FuzzyStrings::ratio
, FuzzyStrings::ratioToken
, FuzzyStrings::ratioTokenSet
and FuzzyStrings::complexRatio
can be used as compareFunction
.
Example:
String s = "this is a test";
List<String> candidates = new ArrayList<>();
candidates.add("this is a pest");
candidates.add("not a match at all");
candidates.add("this is a bird's nest");
candidates.add("this is a test test");
candidates.add("this is a test!");
candidates.add("this is a Test");
candidates.add("this is a quick test");
List<StringMatch> results = FuzzyStrings.matchAndSort(s, candidates, FuzzyStrings::complexRatio, false);
for (StringMatch m : results) {
System.out.println(m.getScore() + ": " + m.getText());
}
Output:
99: this is a test! 91: this is a test test 87: this is a quick test 86: this is a pest 86: this is a Test 69: this is a bird's nest 40: not a match at all
public class StringMatch implements Comparable<StringMatch>
This class implements a string match. It contains both the text that has been matched and its score. The score shows how similar the text is to the string it has been compared to.
To get the text, use the getText()
method:
StringMatch match = FuzzyStrings.matchOne(s, candidates, FuzzyStrings::complexRatio, false);
String matchedString = match.getText();
Similarly, to get the score, use the getScore()
method:
int score = match.getScore();
@FunctionalInterface
public interface StringCompareFunction {
int compare(String s1, String s2, boolean ignoreCase);
}
This project is licensed under the terms of the MIT license. See LICENSE
for details.