Docker container for dupeGuru

This project implements a Docker container for dupeGuru.

The GUI of the application is accessed through a modern web browser (no installation or configuration needed on the client side) or via any VNC client.

dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same.

Table of Content

Quick Start
Usage
Docker Compose File
Docker Image Versioning
Docker Image Update
- Synology
- unRAID
User/Group IDs
Accessing the GUI
Security
Reverse Proxy
- Routing Based on Hostname
- Routing Based on URL Path
Shell Access
dupeGuru Deletion Options
Support or Contact

Quick Start

NOTE: The Docker command provided in this quick start is given as an example and parameters should be adjusted to your need.

Launch the dupeGuru docker container with the following command:

docker run -d \
    --name=dupeguru \
    -p 5800:5800 \
    -v /docker/appdata/dupeguru:/config:rw \
    -v /home/user:/storage:rw \
    jlesage/dupeguru

Where:

/docker/appdata/dupeguru: This is where the application stores its configuration, states, log and any files needing persistency.
/home/user: This location contains files from your host that need to be accessible to the application.

Browse to http://your-host-ip:5800 to access the dupeGuru GUI. Files from the host appear under the /storage folder in the container.

Usage

docker run [-d] \
    --name=dupeguru \
    [-e <VARIABLE_NAME>=<VALUE>]... \
    [-v <HOST_DIR>:<CONTAINER_DIR>[:PERMISSIONS]]... \
    [-p <HOST_PORT>:<CONTAINER_PORT>]... \
    jlesage/dupeguru

Parameter	Description
-d	Run the container in the background. If not set, the container runs in the foreground.
-e	Pass an environment variable to the container. See the Environment Variables section for more details.
-v	Set a volume mapping (allows to share a folder/file between the host and the container). See the Data Volumes section for more details.
-p	Set a network port mapping (exposes an internal container port to the host). See the Ports section for more details.

Environment Variables

To customize some properties of the container, the following environment variables can be passed via the -e parameter (one for each variable). Value of this parameter has the format <VARIABLE_NAME>=<VALUE>.

Variable	Description	Default
`USER_ID`	ID of the user the application runs as. See User/Group IDs to better understand when this should be set.	`1000`
`GROUP_ID`	ID of the group the application runs as. See User/Group IDs to better understand when this should be set.	`1000`
`SUP_GROUP_IDS`	Comma-separated list of supplementary group IDs of the application.	(no value)
`UMASK`	Mask that controls how permissions are set for newly created files and folders. The value of the mask is in octal notation. By default, the default umask value is `0022`, meaning that newly created files and folders are readable by everyone, but only writable by the owner. See the online umask calculator at http://wintelguy.com/umask-calc.pl.	`0022`
`LANG`	Set the locale, which defines the application's language, if supported. Format of the locale is `language[_territory][.codeset]`, where language is an ISO 639 language code, territory is an ISO 3166 country code and codeset is a character set, like `UTF-8`. For example, Australian English using the UTF-8 encoding is `en_AU.UTF-8`.	`en_US.UTF-8`
`TZ`	TimeZone used by the container. Timezone can also be set by mapping `/etc/localtime` between the host and the container.	`Etc/UTC`
`KEEP_APP_RUNNING`	When set to `1`, the application will be automatically restarted when it crashes or terminates.	`0`
`APP_NICENESS`	Priority at which the application should run. A niceness value of -20 is the highest priority and 19 is the lowest priority. The default niceness value is 0. NOTE: A negative niceness (priority increase) requires additional permissions. In this case, the container should be run with the docker option `--cap-add=SYS_NICE`.	`0`
`INSTALL_PACKAGES`	Space-separated list of packages to install during the startup of the container. List of available packages can be found at https://mirrors.alpinelinux.org. ATTENTION: Container functionality can be affected when installing a package that overrides existing container files (e.g. binaries).	(no value)
`PACKAGES_MIRROR`	Mirror of the repository to use when installing packages. List of mirrors is available at https://mirrors.alpinelinux.org.	(no value)
`CONTAINER_DEBUG`	Set to `1` to enable debug logging.	`0`
`DISPLAY_WIDTH`	Width (in pixels) of the application's window.	`1920`
`DISPLAY_HEIGHT`	Height (in pixels) of the application's window.	`1080`
`DARK_MODE`	When set to `1`, dark mode is enabled for the application.	`0`
`SECURE_CONNECTION`	When set to `1`, an encrypted connection is used to access the application's GUI (either via a web browser or VNC client). See the Security section for more details.	`0`
`SECURE_CONNECTION_VNC_METHOD`	Method used to perform the secure VNC connection. Possible values are `SSL` or `TLS`. See the Security section for more details.	`SSL`
`SECURE_CONNECTION_CERTS_CHECK_INTERVAL`	Interval, in seconds, at which the system verifies if web or VNC certificates have changed. When a change is detected, the affected services are automatically restarted. A value of `0` disables the check.	`60`
`WEB_LISTENING_PORT`	Port used by the web server to serve the UI of the application. This port is used internally by the container and it is usually not required to be changed. By default, a container is created with the default bridge network, meaning that, to be accessible, each internal container port must be mapped to an external port (using the `-p` or `--publish` argument). However, if the container is created with another network type, changing the port used by the container might be useful to prevent conflict with other services/containers. NOTE: a value of `-1` disables listening, meaning that the application's UI won't be accessible over HTTP/HTTPs.	`5800`
`VNC_LISTENING_PORT`	Port used by the VNC server to serve the UI of the application. This port is used internally by the container and it is usually not required to be changed. By default, a container is created with the default bridge network, meaning that, to be accessible, each internal container port must be mapped to an external port (using the `-p` or `--publish` argument). However, if the container is created with another network type, changing the port used by the container might be useful to prevent conflict with other services/containers. NOTE: a value of `-1` disables listening, meaning that the application's UI won't be accessible over VNC.	`5900`
`VNC_PASSWORD`	Password needed to connect to the application's GUI. See the VNC Password section for more details.	(no value)
`ENABLE_CJK_FONT`	When set to `1`, open-source computer font `WenQuanYi Zen Hei` is installed. This font contains a large range of Chinese/Japanese/Korean characters.	`0`

Deployment Considerations

Many tools used to manage Docker containers extract environment variables defined by the Docker image and use them to create/deploy the container. For example, this is done by:

The Docker application on Synology NAS
The Container Station on QNAP NAS
Portainer
etc.

While this can be useful for the user to adjust the value of environment variables to fit its needs, it can also be confusing and dangerous to keep all of them.

A good practice is to set/keep only the variables that are needed for the container to behave as desired in a specific setup. If the value of variable is kept to its default value, it means that it can be removed. Keep in mind that all variables are optional, meaning that none of them is required for the container to start.

Removing environment variables that are not needed provides some advantages:

Prevents keeping variables that are no longer used by the container. Over time, with image updates, some variables might be removed.
Allows the Docker image to change/fix a default value. Again, with image updates, the default value of a variable might be changed to fix an issue, or to better support a new feature.
Prevents changes to a variable that might affect the correct function of the container. Some undocumented variables, like PATH or ENV, are required to be exposed, but are not meant to be changed by users. However, container management tools still show these variables to users.
There is a bug with the Container Station on QNAP and the Docker application on Synology, where an environment variable without value might not be allowed. This behavior is wrong: it's absolutely fine to have a variable without value. In fact, this container does have variables without value by default. Thus, removing unneeded variables is a good way to prevent deployment issue on these devices.

Data Volumes

The following table describes data volumes used by the container. The mappings are set via the -v parameter. Each mapping is specified with the following format: <HOST_DIR>:<CONTAINER_DIR>[:PERMISSIONS].

Container path	Permissions	Description
`/config`	rw	This is where the application stores its configuration, states, log and any files needing persistency.
`/storage`	rw	This location contains files from your host that need to be accessible to the application.
`/trash`	rw	This is where duplicated files are moved when they are sent to trash.

Ports

Here is the list of ports used by the container.

When using the default bridge network, ports can be mapped to the host via the -p parameter (one per port mapping). Each mapping is defined with the following format: <HOST_PORT>:<CONTAINER_PORT>. The port number used inside the container might not be changeable, but you are free to use any port on the host side.

See the Docker Container Networking documentation for more details.

Port	Protocol	Mapping to host	Description
5800	TCP	Optional	Port to access the application's GUI via the web interface. Mapping to the host is optional if access through the web interface is not wanted. For a container not using the default bridge network, the port can be changed with the `WEB_LISTENING_PORT` environment variable.
5900	TCP	Optional	Port to access the application's GUI via the VNC protocol. Mapping to the host is optional if access through the VNC protocol is not wanted. For a container not using the default bridge network, the port can be changed with the `VNC_LISTENING_PORT` environment variable.

Changing Parameters of a Running Container

As can be seen, environment variables, volume and port mappings are all specified while creating the container.

The following steps describe the method used to add, remove or update parameter(s) of an existing container. The general idea is to destroy and re-create the container:

Stop the container (if it is running):

docker stop dupeguru

Remove the container:

docker rm dupeguru

Create/start the container using the docker run command, by adjusting parameters as needed.

NOTE: Since all application's data is saved under the /config container folder, destroying and re-creating a container is not a problem: nothing is lost and the application comes back with the same state (as long as the mapping of the /config folder remains the same).

Docker Compose File

Here is an example of a docker-compose.yml file that can be used with Docker Compose.

Make sure to adjust according to your needs. Note that only mandatory network ports are part of the example.

version: '3'
services:
  dupeguru:
    image: jlesage/dupeguru
    ports:
      - "5800:5800"
    volumes:
      - "/docker/appdata/dupeguru:/config:rw"
      - "/home/user:/storage:rw"

Docker Image Versioning

Each release of a Docker image is versioned. Prior to october 2022, the semantic versioning was used as the versioning scheme.

Since then, versioning scheme changed to calendar versioning. The format used is YY.MM.SEQUENCE, where:

YY is the zero-padded year (relative to year 2000).
MM is the zero-padded month.
SEQUENCE is the incremental release number within the month (first release is 1, second is 2, etc).

Docker Image Update

Because features are added, issues are fixed, or simply because a new version of the containerized application is integrated, the Docker image is regularly updated. Different methods can be used to update the Docker image.

The system used to run the container may have a built-in way to update containers. If so, this could be your primary way to update Docker images.

An other way is to have the image be automatically updated with Watchtower. Watchtower is a container-based solution for automating Docker image updates. This is a "set and forget" type of solution: once a new image is available, Watchtower will seamlessly perform the necessary steps to update the container.

Finally, the Docker image can be manually updated with these steps:

Fetch the latest image:

docker pull jlesage/dupeguru

Stop the container:

docker stop dupeguru

Remove the container:

docker rm dupeguru

Create and start the container using the docker run command, with the the same parameters that were used when it was deployed initially.

Synology

For owners of a Synology NAS, the following steps can be used to update a container image.

Open the Docker application.
Click on Registry in the left pane.
In the search bar, type the name of the container (jlesage/dupeguru).
Select the image, click Download and then choose the latest tag.
Wait for the download to complete. A notification will appear once done.
Click on Container in the left pane.
Select your dupeGuru container.
Stop it by clicking Action->Stop.
Clear the container by clicking Action->Reset (or Action->Clear if you don't have the latest Docker application). This removes the container while keeping its configuration.
Start the container again by clicking Action->Start. NOTE: The container may temporarily disappear from the list while it is re-created.

unRAID

For unRAID, a container image can be updated by following these steps:

Select the Docker tab.
Click the Check for Updates button at the bottom of the page.
Click the update ready link of the container to be updated.

User/Group IDs

When using data volumes (-v flags), permissions issues can occur between the host and the container. For example, the user within the container may not exist on the host. This could prevent the host from properly accessing files and folders on the shared volume.

To avoid any problem, you can specify the user the application should run as.

This is done by passing the user ID and group ID to the container via the USER_ID and GROUP_ID environment variables.

To find the right IDs to use, issue the following command on the host, with the user owning the data volume on the host:

id <username>

Which gives an output like this one:

uid=1000(myuser) gid=1000(myuser) groups=1000(myuser),4(adm),24(cdrom),27(sudo),46(plugdev),113(lpadmin)

The value of uid (user ID) and gid (group ID) are the ones that you should be given the container.

Accessing the GUI

Assuming that container's ports are mapped to the same host's ports, the graphical interface of the application can be accessed via:

A web browser:

http://<HOST IP ADDR>:5800

Any VNC client:

<HOST IP ADDR>:5900

Security

By default, access to the application's GUI is done over an unencrypted connection (HTTP or VNC).

Secure connection can be enabled via the SECURE_CONNECTION environment variable. See the Environment Variables section for more details on how to set an environment variable.

When enabled, application's GUI is performed over an HTTPs connection when accessed with a browser. All HTTP accesses are automatically redirected to HTTPs.

When using a VNC client, the VNC connection is performed over SSL. Note that few VNC clients support this method. SSVNC is one of them.

SSVNC

SSVNC is a VNC viewer that adds encryption security to VNC connections.

While the Linux version of SSVNC works well, the Windows version has some issues. At the time of writing, the latest version 1.0.30 is not functional, as a connection fails with the following error:

ReadExact: Socket error while reading

However, for your convenience, an unofficial and working version is provided here:

https://github.com/jlesage/docker-baseimage-gui/raw/master/tools/ssvnc_windows_only-1.0.30-r1.zip

The only difference with the official package is that the bundled version of stunnel has been upgraded to version 5.49, which fixes the connection problems.

Certificates

Here are the certificate files needed by the container. By default, when they are missing, self-signed certificates are generated and used. All files have PEM encoded, x509 certificates.

Container Path	Purpose	Content
`/config/certs/vnc-server.pem`	VNC connection encryption.	VNC server's private key and certificate, bundled with any root and intermediate certificates.
`/config/certs/web-privkey.pem`	HTTPs connection encryption.	Web server's private key.
`/config/certs/web-fullchain.pem`	HTTPs connection encryption.	Web server's certificate, bundled with any root and intermediate certificates.

NOTE: To prevent any certificate validity warnings/errors from the browser or VNC client, make sure to supply your own valid certificates.

NOTE: Certificate files are monitored and relevant daemons are automatically restarted when changes are detected.

VNC Password

To restrict access to your application, a password can be specified. This can be done via two methods:

By using the VNC_PASSWORD environment variable.
By creating a .vncpass_clear file at the root of the /config volume. This file should contain the password in clear-text. During the container startup, content of the file is obfuscated and moved to .vncpass.

The level of security provided by the VNC password depends on two things:

The type of communication channel (encrypted/unencrypted).
How secure the access to the host is.

When using a VNC password, it is highly desirable to enable the secure connection to prevent sending the password in clear over an unencrypted channel.

ATTENTION: Password is limited to 8 characters. This limitation comes from the Remote Framebuffer Protocol RFC (see section 7.2.2). Any characters beyond the limit are ignored.

Reverse Proxy

The following sections contain NGINX configurations that need to be added in order to reverse proxy to this container.

A reverse proxy server can route HTTP requests based on the hostname or the URL path.

Routing Based on Hostname

In this scenario, each hostname is routed to a different application/container.

For example, let's say the reverse proxy server is running on the same machine as this container. The server would proxy all HTTP requests sent to dupeguru.domain.tld to the container at 127.0.0.1:5800.

Here are the relevant configuration elements that would be added to the NGINX configuration:

map $http_upgrade $connection_upgrade {
	default upgrade;
	''      close;
}

upstream docker-dupeguru {
	# If the reverse proxy server is not running on the same machine as the
	# Docker container, use the IP of the Docker host here.
	# Make sure to adjust the port according to how port 5800 of the
	# container has been mapped on the host.
	server 127.0.0.1:5800;
}

server {
	[...]

	server_name dupeguru.domain.tld;

	location / {
	        proxy_pass http://docker-dupeguru;
	}

	location /websockify {
		proxy_pass http://docker-dupeguru;
		proxy_http_version 1.1;
		proxy_set_header Upgrade $http_upgrade;
		proxy_set_header Connection $connection_upgrade;
		proxy_read_timeout 86400;
	}
}

Routing Based on URL Path

In this scenario, the hostname is the same, but different URL paths are used to route to different applications/containers.

For example, let's say the reverse proxy server is running on the same machine as this container. The server would proxy all HTTP requests for server.domain.tld/dupeguru to the container at 127.0.0.1:5800.

Here are the relevant configuration elements that would be added to the NGINX configuration:

map $http_upgrade $connection_upgrade {
	default upgrade;
	''      close;
}

upstream docker-dupeguru {
	# If the reverse proxy server is not running on the same machine as the
	# Docker container, use the IP of the Docker host here.
	# Make sure to adjust the port according to how port 5800 of the
	# container has been mapped on the host.
	server 127.0.0.1:5800;
}

server {
	[...]

	location = /dupeguru {return 301 $scheme://$http_host/dupeguru/;}
	location /dupeguru/ {
		proxy_pass http://docker-dupeguru/;
		location /dupeguru/websockify {
			proxy_pass http://docker-dupeguru/websockify/;
			proxy_http_version 1.1;
			proxy_set_header Upgrade $http_upgrade;
			proxy_set_header Connection $connection_upgrade;
			proxy_read_timeout 86400;
		}
	}
}

Shell Access

To get shell access to the running container, execute the following command:

docker exec -ti CONTAINER sh

Where CONTAINER is the ID or the name of the container used during its creation.

dupeGuru Deletion Options

When deleting duplicated files, dupeGuru offer two choices:

Send files to trash
Delete files directly

The first option moves files to the /trash directory inside the container. This operation can be slow for large files since it may imply a copy of the data before the actual deletion.

There is also an option to link deleted files. It is not recommended to enable this option, since there is a good chance that created links won't make sense outside the container.

Support or Contact

Having troubles with the container or have questions? Please create a new issue.

For other great Dockerized applications, see https://jlesage.github.io/docker-apps.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github		.github
rootfs		rootfs
DOCKERHUB.md		DOCKERHUB.md
Dockerfile		Dockerfile
README.md		README.md
appdefs.yml		appdefs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker container for dupeGuru

Table of Content

Quick Start

Usage

Environment Variables

Deployment Considerations

Data Volumes

Ports

Changing Parameters of a Running Container

Docker Compose File

Docker Image Versioning

Docker Image Update

Synology

unRAID

User/Group IDs

Accessing the GUI

Security

SSVNC

Certificates

VNC Password

Reverse Proxy

Routing Based on Hostname

Routing Based on URL Path

Shell Access

dupeGuru Deletion Options

Support or Contact

About

Releases

Packages

Languages

DrBlokmeister/docker-dupeguru

Folders and files

Latest commit

History

Repository files navigation

Docker container for dupeGuru

Table of Content

Quick Start

Usage

Environment Variables

Deployment Considerations

Data Volumes

Ports

Changing Parameters of a Running Container

Docker Compose File

Docker Image Versioning

Docker Image Update

Synology

unRAID

User/Group IDs

Accessing the GUI

Security

SSVNC

Certificates

VNC Password

Reverse Proxy

Routing Based on Hostname

Routing Based on URL Path

Shell Access

dupeGuru Deletion Options

Support or Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages