Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
Nov 8, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
The SQL/Ibis powered sklearn of record linkage
Python library for the generation and mutation of realistic personal identification data at scale
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Hierarchical record linkage at scale
Fast, accurate, open-source geocoding in Python
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Unstructured Record Linkage using Siamese Networks and Large Language Models (LLMs) such as LLAMA3 and ChatGPT-4o.
Collection of software packages for performing privacy-preserving record linkage based on Bloom filters
🧱 blocking methods for entity resolution
Python package for deduplication/entity resolution using active learning
🆔 Examples for using the dedupe library
A maximum-strength name parser for record linkage.
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
Backend (Docker & API) for matchID project
A Python script for generating duplicate data to test the performance of record linkage and master data management systems.
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
CERTA - Computing Entity Resolution explanations with TriAngles
Example scripts for generating data with Gecko
Created by Halbert L. Dunn
Released 1946