Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 2.68 KB

README.md

File metadata and controls

54 lines (39 loc) · 2.68 KB

Disclaimer

This is a fork of the original image-match repo and not officially supported. The reason I created this fork was to make little changes that would allow the library to work with more recent dependency versions, as well as elasticsearch versions. I would have loved to integrate those changes into upstream but afaik the repository is not maintained anymore.

I am open to contributions and suggestions on how to improve this repo.

Developing

Some distros (like Arch Linux) require system packages for some python dependencies, like

  • six
  • scikit-image

Original README

PyPI PyPI Documentation Status codecov

image-match

image-match is a simple (now Python 3!) package for finding approximate image matches from a corpus. It is similar, for instance, to pHash, but includes a database backend that easily scales to billions of images and supports sustained high rates of image insertion: up to 10,000 images/s on our cluster!

PLEASE NOTE: This algorithm is intended to find nearly duplicate images -- think copyright violation detection. It is NOT intended to find images that are conceptually similar. For more explanation, see this issue or this video.

Based on the paper An image signature for any kind of image, Wong et al. There is an existing reference implementation which may be more suited to your needs.

The folks over at Pavlov have released an excellent containerized version of image-match for easy scaling and deployment.

Quick start

Once you're up and running, read these two (short) sections of the documentation to get a feel for what image-match is capable of: