Skip to content

A naive command-line tool to remove duplicated images using OpenCV

Notifications You must be signed in to change notification settings

ybubnov/imagedup

Repository files navigation

Imagedup

A tool to manage image duplicates - program find image duplicates in a specified folder and removes them if necessary.

Installation

We recommend to use pyenv to manage necessary python version and poetry to manage dependencies installation:

% brew install pyenv pyenv-virtualenv

Then the process of configuring the environment looks like following:

% pyenv install 3.9.1
% pyenv virtualenv 3.9.1 imagedup
% pyenv activate imagedup

Then install necessary dependencies:

% pip install poetry
% poetry config virtualenvs.create false
% poetry install --with-root

Usage

You can run an imagedup command right from the repository root in the following way:

% python -m imagedup ./dataset

By default the tool does not delete files and simply prints the files to delete into the standard output. If you want to delete duplicates, consider calling the tool like following:

% python -m imagedup.shell ./dataset -q --rm

Analysis

The following image outlines how exactly the --min-score and --min-area parameters relate to the number of images being removed from the directory.

By default this tool guarantees removal of 50% of the images from a directory. Gris Search

About

A naive command-line tool to remove duplicated images using OpenCV

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published