image-sandbox

The image sandbox contains the infrastructure for implementing a reverse image search using image hashing and fingerprinting. This reverse image search functionality allows for a desired image to be inputted by the user and for visually similar images to be returned by the infrastructure.

There are a myraid of applications that can stem from this infrastructure such as finding duplicate images in the website's media library, matching artwork at the Art Institute of Chicago to other museums, and allowing a search by image feature on the museum's website.

This project currently focuses on applying the reverse image search infrastructure towards creating a prototype search by image feature on the museum dashboard. You can see a demo here:

http://dashboard-data.artic.edu/#!/images

Features

Downloads and converts set of artwork images into different test cases (ex. scaling, resizing, rotating, etc.)
Converts set of artwork images into average and perceptual hashes and stores them in Elasticsearch index as binary hashes
Query Elasticsearch index using input image to find best artwork image match
Update data aggregator to store hashes as binaries to allow for efficient querying

Overview

Matching

Contains scripts to download selected set of 30 test artwork images from the museum's collection and converts set of images into different test cases. Proportionally scaled, disproportionally scaled, resize, rotated, color filtered, crop, and altered image exif data images are created from each image in the test suite.

In-gallery

Contains scripts to download selected set of 20 test artwork images and their matching in gallery images of the artwork.

Testing

Contains scripts to test accuracy of matching algorithm to see if inputted altered image matches with its corresponding artwork.

Hashing

Contains scripts to convert selected set of 30 test artwork images into average and perceptual hashes.

Searching

Contains script to store hashes of set of 30 test artwork images as image hash binaries in Elasticsearch index. Also contains script to search Elasticsearch index for matching artwork images of given input image based on percentage of matching binaries.

Quality

Contains some bash scripts for analyzing how much disk space might be saved by processing our images using danielgtaylor/jpeg-archive. Meant to reduce network transfer filesize.

Requirements

The project is includes the following requirements:

Pillow 8.1.*
ImageHash 4.0
Elasticsearch 6.3.1

For development, we recommend that you use Laravel Homestead. It includes everything you need to run this project. Note that you will need to enable the optional Elasticsearch feature in your Homestead.yaml.

Installing

In order to get project installation set up use the following tutorial to get set up with Homestead and Elasticsearch. https://art-institute-of-chicago.atlassian.net/wiki/spaces/DD/pages/24477699/Setting+up+Homestead#Setting-up-Elasticsearch

Once you have set up Homestead and Elasticsearch run Homestead in vagrant to get local host output.

Make sure to forward your Elasticsearch port from your host to Homestead and to have the correct ports set up. Currently, the code uses port 2200 in search_hashes.py and store_hashes.py scripts. Make sure to alter these ports either in your Homestead.yaml file or the appropriate scripts to match.

ports:
    - send: 2200
      to: 9200
      protocol: tcp

Developing

To start developing further, clone the repository and download the given requirements:

git clone https://github.com/art-institute-of-chicago/image-sandbox.git
pip -r requirements.lock

Acknowledgements

Image matching prototype work completed in 2021-22 by Nadine Siddharta, McMullan Arts Leadership Intern in Experience Design, Web Engineering.

The following sources have been referenced in the development of the store_hashes.py and search_hashes.py scripts.

Licensing

This project is licensed under the GNU Affero General Public License Version 3.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
compare		compare
hashing		hashing
in-gallery		in-gallery
matching		matching
quality		quality
searching		searching
testing		testing
.editorconfig		.editorconfig
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
requirements.lock		requirements.lock
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

image-sandbox

Features

Overview

Matching

In-gallery

Testing

Hashing

Searching

Quality

Requirements

Installing

Developing

Acknowledgements

Licensing

About

Releases

Packages

Contributors 2

Languages

License

art-institute-of-chicago/image-sandbox

Folders and files

Latest commit

History

Repository files navigation

image-sandbox

Features

Overview

Matching

In-gallery

Testing

Hashing

Searching

Quality

Requirements

Installing

Developing

Acknowledgements

Licensing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages