Execute commands for new and changed files in directories. Builds on Facebook's Watchman, a file watching service. The author uses Baamhackl to pass PDF files produced by a network-connected scanner through OCRmyPDF and other tools.
"Baamhackl" is Bavarian for a woodpecker.
A YAML-formatted configuration file is required to define observed directories and handler commands. Example:
handlers:
- name: scanned
path: /srv/shared/scanned
command: ["/bin/bash", "-c", "echo ${BAAMHACKL_INPUT}"]
Every time a file is created in or moved into the /srv/shared/scanned
directory the given handler command is launched. File modifications while the
command is running are considered a handler failure triggering a retry.
A log of per-file actions taken is recorded in the journal directory located at
_/journal
relative to the observed directory, i.e.
/srv/shared/scanned/_/journal
in the example above. After the handler command
succeeded the originally changed file is moved to the _/success
directory. In
case of exhausting all retries a file for which the command fails consistently
is moved to the _/failure
directory.
Command logs, successful and failed files are cleaned up periodically.
To use Baamhackl a Watchman server must already be running and accessible (e.g. launched via systemd or another service manager). For debugging purposes an instance can be launched in the foreground:
watchman --foreground --log-level=1 --logfile=/dev/stderr
The number of handler commands to run concurrently can be configured with
baamhackl watch -slots=N
.
The baamhackl selftest
subcommand executes a small number of tests to verify
whether the system is configured correctly.
The configuration for the baamhackl watch
subcommand is either specified via
the -config
flag or the BAAMHACKL_CONFIG_FILE
environment variable. The
following commands are equivalent:
baamhackl watch -config ./config.yaml
BAAMHACKL_CONFIG_FILE=./config.yaml baamhackl watch
Configuration files use the YAML format.
At the root is a single option, handlers
, which is a list of handler
configuration objects. Each handler supports the following options:
Option | Default | Description |
---|---|---|
name |
(none) | Handler name. Used for logging and naming the trigger command in Watchman. |
path |
(none) | Absolute path to observed directory. |
command |
(none) | Handler command arguments as a list, e.g. ["/usr/local/bin/handle-change", "arg", "another"] . Arguments are visible in log files and should not contain confidential information such as passwords or access tokens. Store them in separate files outside path . |
timeout |
1h |
Timeout for executing the command. |
recursive |
false |
Observe directory recursively (excluding the infrastructure directories). |
include_hidden |
false | Whether to invoke command for files starting with a dot (. ). |
min_size_bytes max_size_bytes |
0 | Minimum and maximum file size for running command. Use zero to disable. Files smaller or larger than the configured values are ignored. |
settle_duration |
1s |
Amount of time the filesystem should be idle before dispatching commands. |
retry_count |
2 | Number of times a failing command should be retried. Set to 0 to make the first failure permanent. |
retry_delay_initial |
15m |
Amount of time to wait between retry attempts. A small and random amount of variation is always applied. |
retry_delay_factor |
1.5 | Back-off factor to apply between attempts after the first retry. Use 1 to always use the same delay. |
retry_delay_max |
1h |
Maximum amount of time to wait between retry attempts. Use 0s for no limit. |
journal_dir |
_/journal |
Path1 to directory for command logs. |
journal_retention |
7 days | Amount of time before logs and processed files are deleted. |
success_dir |
_/success |
Path1 to directory into which successfully handled files are moved. |
failure_dir |
_/failure |
Path1 to directory for files for which the command failed persistently. |
Handler commands are started when a file change is detected. Commands are
considered to be successful when they exit with a zero status code. In all
other cases the command is re-run until it either succeeds or retry_count
attempts have passed.
Environment variables available to handler commands:
Name | Description |
---|---|
BAAMHACKL_PROGRAM |
Absolute path to the Baamhackl program. |
BAAMHACKL_ORIGINAL |
Path of changed file. Use only for informative purposes as the original may be modified concurrently. A copy of the file is made available via BAAMHACKL_INPUT . |
BAAMHACKL_INPUT |
Path to a copy of the changed file. |
BAAMHACKL_WORKDIR |
Path to a directory where the handler command can store temporary files. This is also the working directory when the command is started. |
If a command should produce an output in a particular directory it needs to do
so on its own. Baamhackl provides the baamhackl move-into
subcommand to move
a file into a destination folder without overwriting any existing file. It does
so by finding a new and available name in case of a conflict. Example:
${BAAMHACKL_PROGRAM} move-into /srv/shared/finished ./output.pdf
Baamhackl is instrumented for Prometheus monitoring. Specify an address and port to listen on:
baamhackl watch -metrics_address 127.0.0.1:9999
Scrape the metrics:
$ curl -s http://localhost:9999/metrics | grep ^baamhackl_build_info
baamhackl_build_info{[…]} 1
Watchman is a required dependency. By default the watchman
program is looked up via $PATH
. Specify an absolute path using the
-watchman_program
flag, e.g.
baamhackl watch -watchman_program=/opt/watchman/bin/watchman
.
Pre-built binaries are provided for all releases:
- Binary archives (
.tar.gz
) - Debian/Ubuntu (
.deb
) - RHEL/Fedora (
.rpm
)
Docker image via GitHub's container registry:
docker pull ghcr.io/hansmi/baamhackl
Note that the image only contains Baamhackl itself and none of its dependencies. Combine the image with another in a multi-stage build. Example using Debian:
FROM ghcr.io/hansmi/baamhackl:latest AS baamhackl
FROM docker.io/library/debian:stable
RUN \
apt-get update && \
apt-get install -y watchman && \
apt-get clean
COPY --from=baamhackl /baamhackl /usr/bin/baamhackl
RUN baamhackl selftest
With the source being available it's also possible to produce custom builds directly using Go or GoReleaser.
The current implementation the Baamhackl program relies on a few of
Linux-specific system calls such as
renameat2()
.
Support for more operating systems would require the implementation of
alternatives.
In multi-user environments it's strongly recommended to run Baamhackl in a container with limited filesystem visibility. Only the directories used by the configuration and handler commands should be made available.
Operations on filesystems shared by multiple users, either locally or via network protocols such as Network File System (NFS) or Server Message Block (SMB), are prone to race conditions. Locking isn't supported universally and can't be relied upon.
A program like Baamhackl which observes file changes before acting upon them needs to account for concurrent changes. Source files modified while the handler command runs will cause a failure and a subsequent retry. Atomic file operations are used where possible.
It's unrealistic to avoid race conditions under the given conditions. After the handler command is done the originally changed file needs to be taken out of the input directory to not re-process it later. Given that the file has been processed it could be removed. However, between the command finishing, checking for changes and removing the file a user could modify it again. The subsequent removal would cause a data loss. For this reason files are first moved to an archive directory where they remain for some time.
Path traversals are another issue. Modified files could be replaced with a symlink between Watchman reporting a change and Baamhackl actually getting around to processing the file.
Commands can also be given inputs causing them to read arbitrary files and
either logging their contents or copying them to a location accessible to an
attacker. The handler command ["bash", "-c", "source $BAAMHACKL_INPUT"]
implements direct remote code execution.