yente

A matchmaker for textual data.

Alternate version

A faster implementation written in Rust is also available: yenta. This version in Haskell allows me to experiment with the algorithm easier. Most people will appreciate the 30x - 60x speed improvement from yenta.

Overview

yente matches names across two data files. It has the following features:

Intelligent: Matching is based on rareness of words, which means that one does not need to preprocess the names to remove common, non-informative words in names (i.e. and, the, company)
Robust: yente incorporates feautes that are commonly needed in name matching. It is both word-order and case insensitive (Shawn Spencer matches SPENCER, SHAWN). And, yente removes punctuation by default.
Customizable: Users may optionally allow for misspellings, implement phonetic algorithms, trim the constituent words of a name at a prespecified number of characters, output any number of potential matches (with and without ties), and combine any of the preceding customizations.
High-ish performance: yente is a multi-core program, allowing users to maximize computational power. Performance improvements are ongoing.
Unicode aware: By default, yente automatically converts unicode accented characters to their ASCII equivalents.

Information

See the wiki for information on installation, usage, and best practices. It also includes some examples for matching problems that commonly arise in research.

Contributing

Submit a pull request and I will respond.

If yente has in any way made your life easier, please send me an email or star this repository. If you would like to see a feature added, let me know through the Github forum.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
App		App
src/App/Yente		src/App/Yente
.gitignore		.gitignore
.stylish-haskell.yaml		.stylish-haskell.yaml
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
package.yaml		package.yaml
stack.yaml		stack.yaml
yente.cabal		yente.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yente

Alternate version

Overview

Information

Contributing

About

Releases

Packages

Languages

License

tumarkin/yente

Folders and files

Latest commit

History

Repository files navigation

yente

Alternate version

Overview

Information

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages