A collection of utilities for comic book archive management. Currently this is in alpha testing (AKA perfectly stable for me but YMMV).
Including the following:
- Supports Rar, Rar5, Zip, 7z, CBZ, CBR, CB7
- Verify archives aren't corrupt using 7Zip
- Verify images aren't corrupt using ImageMagick
- Option to upscale comic books with machine learning to higher resolutions using waifu2x-ncnn-vulkan
- Option to remove white space on margins
- Option to split double pages into individual pages
- Download metadata (from sources like AniList)
- Download covers for use in Komga
- Converts all of your archives to the standard CBZ, comic book zip format
- Rename and group files according to downloaded metadata
- Split nested archives into individual archives
- Intelligently split archives based on volume count, eg a single archive named Jojo v1-3 could be split into 3 separate archives
- Convert a folder of images into an archive - intelligently splitting them according to their volume number
- Easy to understand pattern matching to standardize naming across your library
- Conversion of archives into individual folders so they can be be recognized as a series in Komga
- Removes distributer bloat, like url file links to their site
- Supports a Queue folder that can be used to automatically convert archives on a chron job
- Moves failed conversions to a maintenance folder so you can manually fix and rerun any failed jobs
- Support
- Install
- Commands
- Maintain Collection
- Convert to Series
- Suggest Naming
- Download Covers
- Enhancement Options
- Config
For a more accurate comparison view here.
It keeps the inner margin in tact, to indicate which page is the inner book binding. Here's an inner book binding example I tweeted about.
Split archive with downloaded covers:
Archives read as series in Komga now with metadata from AniList:
Archives read as series in Tachiyomi now with Komga:
If you're into this sort of thing, you might be interested in my podcast or the games I stream:
You can get support here: Discord
If you find my tools useful please consider supporting via Patreon.
The apps current state requires both Windows and WSL.
The app currently only supports being run from the source code, though I'm open to pull-requests to dockerize it or remove the windows dependency. All dependencies are WSL specific, but all paths are input as Windows paths for convenience.
The base functionality requires the following to be installed:
sudo apt-get install p7zip imagemagick unrar
Also make sure the version of Node.js specified in the .nvmrc
(found in the project root) is installed. Currently this is 16.6.2
. I recommend using nvm.
Install yarn
:
npm install --global yarn
In the project root folder execute:
yarn
Puppeteer is also an internal requirement for downloading cover images, so your system may require additional dependencies. On Ubuntu 20.04 i had to install these:
sudo apt install -y libx11-xcb1 libxcomposite1 libxcursor1 libxdamage1 libxi-dev libxtst-dev libnss3 libcups2 libxss1 libxrandr2 libasound2 libatk1.0-0 libatk-bridge2.0-0 libpangocairo-1.0-0 libgtk-3-0 libgbm1
You need ImageMagick convert --version | grep -i heic
to show heic
in the delegates list. This was a pain to setup, but if you follow these instructions on a modern version of Ubuntu you might get it working too.
https://stackoverflow.com/a/66116056
In addition if you want WEBP support, you might have to run
sudo apt install build-essential pkg-config webp libwebp-dev libwebp7
before the commands listed in that stackoverflow post. convert --version
has to have your image formats listed, including webp if you want to be able to use webp files in your comics.
If you're building imagemagick from source you also have to install these other libraries to get jpg working 🫠 https://gist.github.com/nickferrando/fb0a44d707c8c3efd92dedd0f79d2911
I highly recommend reading through the [#config] section and then downloading my config, and adjust it as needed. This is the only part you should have to use your brain on ;).
You can now run any of the commands below from WSL!
I recommend running the commands fromIt is 2023 and I no longer believe this to be the case.bash
, notzsh
, aszsh
can crash WSL when run for long periods with a lot of text output.- Make sure you have a backup of any archives you put in your queueFolders. I've run this with
hundredsthousands of archives now, so it does work well, but there could be bugs. I make no guarantees it will work well for you.
You can use the -v
command with any command to change the log level. Move v's is more verbose. I recommend running all commands with -vv
to see info
logging, so you can see how many succeeded and at what steps.
Error's should always be logged.
-v
: Shows WARN
level
-vv
: Shows INFO
level
-vvv
: Shows DEBUG
level
It convert archives from the seriesFolders
's queueFolders
to CBZ's. Then converts them to series and updates their metadata.
--configFile
--maintainCollection
--offline
: Don't download metadata or use downloaded metadata for file renaming
Download Cover options are also valid.
Enhancement options are also valid.
yarn main --configFile "<configFile>" --maintainCollection
yarn main -vv --configFile "W:\Collection\ComicEater.test.yml" --maintainCollection
- ☑ Get all archives at
queueFolders
and move them to themaintenanceFolders
- ☑ Convert them to CBZ (See the
Convert to CBZ Flow
) - ☑ Use
folderPatterns
to gather metadata from the folder about the files - ☑ Use the
filePatterns
to gather data about the files - ☑ Search remote sources for any additional metadata
- ☑ Download Covers
- ☑ Get all archives from the path (if
maintainCollection
, this will be your queueFolders) - ☑ Get all image folders in path
- ☑ Test that the archives are valid archives with
7z t
- ☑ Get
volumeRange
fromfilePatterns
to infer if multiple volumes are present - ☑ Extract archive in current directory
- ☑ Recursively check for nested archives, and apply each of the following steps to each archive.
- ☑ Remove archive distributer bloat per user config (links to tracker etc.)
- ☑ Validate that there are images present in extracted archives
- ☑ Validate that images are valid using ImageMagick by doing a transform to a 5x5 image - Currently requires writing them to a
/tmp/
directory that is automatically cleaned up after the test is run - ☑ If multiple volumes are present, see if the parent of the image containing subfolder count matches, and if it does, consider each subfolder as a separate volume
- ☑ If
--trimWhiteSpace
is present, run trim through imagemagick - If
--upscale
is present, run the content through waifu2x - ☑ If
--splitPages
is present, cut each page into two - ☑ Repack images
- ☑ If nested archives exist, flatten all nested archives in place of the original
- ☑ If there were no errors, remove the extracted working directory
This is useful for when your archives have already been validated but you want to manually change a series title (maybe it downloaded the wrong one off Anilist). It moves CBZ's to Series folders and update their metadata based on local file and folder patterns. Your archives must already be valid CBZs.
--configFile
--convertToSeries
--offline
Download Cover options are also valid.
yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --convertToSeries
- ☑ Get all archives at
queueFolders
path and move them to themaintenanceFolders
- ☑ Infer each seriesRoot level archives series from file if no existing metadata
- ☑ Get metadata from remote sources
- ☑ Name the series according to the available metadata
- ☑ Put archives in their seriesRoot series folder according to the config
- ☑ Rename the archive according to the metadata and configuration rules
- ☑ Download images for each volume and place in the series folder
This makes no changes to archives. This is useful for when you want to see what ComicEater would rename your archive to. Currently, it won't be able to predict how nested archives or volumes would be extracted.
--configFile
--convertToSeries
--offline
Download Cover options are also valid.
yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --suggestNaming
Downloads covers for each volume and places it in the series
--downloadCover
Expects a path. If none is given, then it will use the series path of each individual series in the job.
--coverQuery "site:bookmeter.com 血界戦線 -Back"
Sometimes it may download the wrong series image even with the validation. For instance the sequel to the manga 血界戦線
, is 血界戦線 back 2 back
. 血界戦線
is still in the name and considered valid. If you want to ignore the sequel you manually run --coverQuery "site:bookmeter.com 血界戦線 -Back"
. Google will then exclude the sequels results containing Back
.
--noCoverValidate
Sometimes the validation will fail if a manga is named something like BEASTARS, but google only found results containing ビースターズ. If you know the query will work, then you can use the --noCoverValidate
to force the first image found in Google's results to be downloaded.
yarn main -vv --configFile 'W:\Collection\ComicEater.yml' --getCovers --downloadCover "W:\Collection\シリーズ"
- ☑ Get all archives at
queueFolders
path - ☑ Get metadata from online sources and local sources
- ☑ Query using
coverQuery
. This defaults to<volumeNumber> <seriesName> <authors> site:bookmeter.com
(You can see this result for yourself on Google Images) - ☑ If
--noCoverValidate
is not present, then validate that the cover's title on Google Images has the correct volume number and series name is present - ☑ Downloads the cover to the
--downloadCover
path with the same name as the volume
--upscale
Runs waifu2x on all images in the archive, and repacks then with their upscaled version. Currently supports -n 2 -s 2
, a setting of 2 denoise level and 2x scale factor. See here for more details.
Currently, upscaling relies on having waifu2x-ncnn-vulkan.exe
on your path. You can get the most recent release from here.
I recommend first trying that a command to waifu2x works, something like this:
waifu2x-ncnn-vulkan.exe -i "W:\\Collection\\SomeFolderWithImages" -o "W:\\Collection\\SomeOutputFolder\\" -n 2 -s 2
NOTE: This program will run as fast as your hardware. It's best if you can confirm it's using your GPU.
If you can get this command working with waifu2x-ncnn-vulkan.exe
on your path, the WSL app can call out to it.
--trimWhiteSpace
Trims white space using GraphicsMagick's trim option. It uses a fuzz factor of 10
so that border colors that are roughly the same color can be properly trimmed. See here for more details.
--splitPages
Cut's pages in half. If Trim White Space option is included, it will wait until after the trim is done. Assumes right to left currently.
Meta data is no longer persisted to the archives. Instead use something more flexible like SND's KOMF.
Inside the app there are 3 ways of thinking about metadata.
- metadata about the archive itself (History)
- metadata about the content (Series, Volume, etc.)
- metadata about the pipelines progress (Context: Internal runtime info of the pipelines "saga" work)
Every time you run a command you give the app a .yml
config file. I personally use one for automated things that I run on a nightly automated task (like converting weekly subscription magazines automatically), and a second config file for manual runs.
There's a lot here, so the easiest way to understand it is to read this, then spend less than 10 mins, trying to understand my real config here. If you have difficulty still you can ask for help on discord.
The pattern matching uses the double curly brace syntax {{metaDataVariableName}}
as a way to indicate where metadata is at.
The pattern matching also uses glob-like syntax to allow for subfolder matching. (I never use more than one folder level deep though personally). So something like {{seriesName}}/**/*
matches the top level folder name as the seriesName
, and no sub-folders would be used in the metadata.
The folderPatterns
and filePatterns
use custom pattern matching to know how to infer metadata from your file names and folders. They use an ordered list, and will take the top pattern it can match with all variables in the pattern.
So if you a file named [Bob] MyManga v01
, and file patterns of
"VerySpecifcPattern[{{authors}}] {{seriesName}} {{volumeNumber}}"
"[{{authors}}] {{seriesName}} v{{volumeNumber}}"
"{{seriesName}}"
It will automatically infer the author is Bob and the series name should be MyManga, and it contains the first volume.Since the top pattern would not match, it would ignore it (VerySpecificPattern wasn't found in the file name[Bob] MyManga v01
). Since the[]
of theauthors
pattern and the space before tehseriesName
and thev
of thevolumeNumber
were present it matched the second pattern.
If instead the file had been named Bob's standalone Manga
, it would match the third pattern, giving it a series name of Bob's standalone Manga
. The author would not be inferred, and the volume number would also be unknown.
Based on the metadata picked up from the file & folder patterns, as well as the metadata gained from external sources like AniList, it will use the outputNamingConventions
as a prioritized list of ways to name your files. It will not use a pattern unless ALL metadata variables were matched (besides fileName
, which can be used as a default).
"{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeRange}}巻"
"{{seriesRoot}}{{seriesName}}/[{{authors}}] {{seriesName}} - 第{{volumeNumber}}{{volumeVariant}}巻"
"{{seriesRoot}}{{fileName}}/{{fileName}}"
The first output pattern would match if [Bob] MyManga v1-4
wasn't able to be automatically split, and therefore had a volumeRange
.
The second pattern would be matched if a volume had a single letter after it, as is common in manga distribution:
[Bob] MyManga v1e
, and would result in: /YourSeriesRoot/MyManga/[Bob] MyManga - 第1e巻
.
The last case would be a fallback to keeping whatever the files original name was, in case the other metadata variables couldn't be found.
{{authors}}
: the default author will be assumed to be the writer in Komga metadata. They will be split bysplitAuthorsBy
, soBob・KanjiEater
could be split into two separate authors: Bob & Kanjieater.{{volumeNumber}}
: Runs it through various validation checks to assure it's actually a number, and also extracts avolumeVariant
, which is at most one letter attached to the volume number. It can also recognize volume ranges, egc2-5
. Chapters and volumes are used without distinction currently.{{publishYear}}
: Any characters{{publishMonth}}
: Any characters{{publishDate}}
: Any characters
{{seriesRoot}}
: The folder from theseriesRoot
{{fileName}}
: The original file name
filesToDeleteWithExtensions
will remove any files in the archive that have a matching file extension. (Common use casesomeAwfulSite.url
)junkToFilter
will remove these patterns from your file names and folder names
You can set default metadata according to the accepted comicInfo.xml fields for komga in the defaults
:
defaults:
contentMetaData:
languageISO: ja
manga: YesAndRightToLeft
Setting your language to ja will assume you want Japanese text in file names, instead of an English translation.
maintenanceFolder
This is used when something goes wrong. All failed files are moved here.
-
File names w/ spaces breaks spawn -
Saga orchestration -
Save detailed file history -
Nested Archives -
Nested Rar test failing -
Configuration from file -
String paths https://stackoverflow.com/questions/29182244/convert-a-string-to-a-template-string -
Better content cleanup -
Suggest naming -
Number padding -
Ignore case of junk to filter -
Prefer series name from series if native -
Deeply nested folders with globbing -
Write ComicInfo.xml -
Remove junk from Image folders, names, content -
Clean CBZs -
Handle Volume ranges -
If TotalVolumes matches folder count, extract to individual -
Nested image folders that are multivolume -
fix halfwidth fullwidth chars for file folder pattern -
Get metadata from archive contents -
Convert Image folders to CBZ -
Fix deletion after extracting folders - doesn't delete the clean dir -
Vendor Series metadata -
Automate maintenance -
Unified Series calls data vendors once per series -
Cleanup regression -
invalid images still being zipped -
Stopped on moving series -
Stat error not killing it on 7z -t -
Start importing clean series -
null in summary/description -
Offline options -
volume range with archives takes second as a batch, but then deletes the first, and leaves the rest as dupes -
Archive with multiple folder volumes failed: Brave Story, not cleaned up but made. Ran on individual volumes, and each was a separate series -
didn't clean up soil 9 & 10 -
Shuto Heru v13-14 -
keep the range if it wasn't split -
Handle hakuneko folders -
Add magazines -
Download book covers -
Trim white space -
Split Double Images -
Waifu2x -
~~ to komga to update after modifying content~~
-
Move folders to prep before doing anything -
config to rename automatically -
Send API request to Komga -
convert to typescript
-
Set cover to second (n) page based on komga tag -
blur nsfw tag -
Add tags: use komf -
Maintain metadata outside of the archive
-
dockerify
-
Get names from google organic search
-
Undo naming / folder move
-
Master Config Test. > x results
-
Manual Series metadata
-
Scraper Series metadata
-
Get a new cover image based on existing dimension / reverse image lookup
-
Detect missing volumes/issues
-
Interactive naming
-
Webp -
Avif deconversion -
Record File hash drift events