A tool to manage image duplicates - program find image duplicates in a specified folder and removes them if necessary.
We recommend to use pyenv
to manage necessary python version and poetry
to manage
dependencies installation:
% brew install pyenv pyenv-virtualenv
Then the process of configuring the environment looks like following:
% pyenv install 3.9.1
% pyenv virtualenv 3.9.1 imagedup
% pyenv activate imagedup
Then install necessary dependencies:
% pip install poetry
% poetry config virtualenvs.create false
% poetry install --with-root
You can run an imagedup
command right from the repository root in the following way:
% python -m imagedup ./dataset
By default the tool does not delete files and simply prints the files to delete into the standard output. If you want to delete duplicates, consider calling the tool like following:
% python -m imagedup.shell ./dataset -q --rm
The following image outlines how exactly the --min-score
and --min-area
parameters
relate to the number of images being removed from the directory.
By default this tool guarantees removal of 50% of the images from a directory.