Releases: Phhofm/models
4xBHI_realplksr
4x BHI RealPLKSR Models
After making the BHI dataset, I trained multiple realplksr dysample models on it to try things out. I am releasing them here so they can be tried out.
I recommend using the dat2 models over these, since they give better results quality wise.
Download links:
4xBHI_realplksr_dysample_multi
4xBHI_realplksr_dysample_multiblur
4xBHI_realplksr_dysample_otf_nn
4xBHI_realplksr_dysample_otf
4xBHI_realplksr_dysample_real
Quick Summary:
mutli - No degradation handling
multiblur - same as multi but slightly sharper output since blur has been used. Also no degradation handling (even though I added 50% compression to the training set (25% jpg, 25% webp), this model seems not able to handle compression.
otf_nn - Trained with realesrgan otf pipeline, handles compression, very little noise handling.
otf - Trained with the real-esrgan otf pipeline, handles compression and noise, denoises rather aggressively.
real - Trained with my real degraded LR set, handles webp and jpg compression, realistic noise, some lens blur etc.
Visual Examples:
4xBHI_realplksr_dysample_multi
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling good quality, uncompressed images
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 285000
Description: 4x realplksr dysample model to upscale good quality, uncompressed images. Does not handle any degradations since LR had been scaled only using down_up, linear, cubic_mitchell, lanczos, gauss and box. Trained on my BHI dataset.
4xBHI_realplksr_dysample_multiblur
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, can handle some blur, jpeg and webp compression
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 354000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. Similiar to the 4xBHI_realplksr_dysample_multi model, the LR had made use of the same multiscaling, but additionally, compression handling had been added for jpeg and webp compression, plus I made use of average, gaussian and anisotropic blur for sharper results.
4xBHI_realplksr_dysample_otf
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, can handle some blur, jpeg compression, and noise, trained with realesrgan otf pipeline.
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 413000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This time, instead of manual degradation, I made use of the real-esrgan otf pipeline with adjusted values to handle some blur, noise and jpeg compression.
4xBHI_realplksr_dysample_otf__nn
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, similiar to the 4xBHI_realplksr_dysample_otf model but without noise handling
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 251000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This is similiar to the 4xBHI_realplksr_dysample_otf model but trained without noise handling capability (_nn -> 'no noise'), since I had trained some models like this in the past to retrain some more details for cases where noise removal simply is not needed.
4xBHI_realplksr_dysample_real
Scale: 4x
Network type: RealPLKSR
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handling some blur, some noise, and compression
Pretrained Model: 4xNomos2_realplksr_dysample
Training iterations: 380000
Description: 4x realplksr dysample model to upscale images, trained on the BHI dataset. This is with a created LR set where I used wtp_destroyer to add blur (50% chance), my ludvae200 model to add noise (50% chance), and then wtp to add compression, scaling, recompression (50% chance), scaling.
4xBHI_dat2_real
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images. Handles realistic noise, some realistic blur, and webp and jpg (re)compression.
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 240000
Description: 4x dat2 upscaling model for web and realistic images. It handles realistic noise, some realistic blur, and webp and jpg (re)compression. Trained on my BHI dataset (390'035 training tiles) with degraded LR subset.
Have a look at the applied_degradations files in the attachement to know what degradations in what order were applied to the BHI dataset to create the corresponding LR set for training.
4xBHI_dat2_otf
4xBHI_dat2_otf
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handles noise and jpg compression
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 270000
Description: 4x dat2 upscaling model, trained with the real-esrgan otf pipeline on my bhi dataset. Handles noise and compression.
4xBHI_dat2_multiblurjpg
4xBHI_dat2_multiblurjpg
Scale: 4x
Network type: DAT2
Author: Philip Hofmann
License: CC-BY-4.0
Purpose: 4x upscaling images, handles jpg compression
Pretrained Model: 4xNomos2_hq_dat2
Training iterations: 320000
Description: 4x dat2 upscaling model, trained with down_up,linear, cubic_mitchell, lanczos, gauss and box scaling algos, some average, gaussian and anisotropic blurs and jpg compression. Trained on my BHI sisr dataset.
You can also try out the 4xBHI_dat2_multiblur checkpoint below (trained to 250000 iters), which cannot handle compression but might give just slightly better output on non-degraded input.
4x Sebica pretrains
4x Sebica pretrains
Just a small release, these are two simple Sebica pretrains both trained to 100k each, meant to stabilize early training for both sebica and sebica_mini models.
Scale: 4
Architecture: Sebica
Architecture Options: sebica and sebica_mini
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: 4x Sebica options pretrains
Subject: Realistic
Input Type: Images
Release Date: 11.11.2024 (dd/mm/yy)
Dataset: DF2K_BHI
Dataset Size: 12'639
OTF (on the fly augmentations): No
Pretrained Model: None
Iterations: 100'000
Batch Size: 8
Patch Size: 48
Description:
Two simple sebica pretrains of 100k iters to stability early training of sebica models, both for the sebica and sebica_mini options.
For training I used my BHI filtered version of DF2K which I released together with my BHI filtering post on huggingface.
2xAoMR_mosr
2xAoMR_mosr
Scale: 4
Architecture: MoSR
Architecture Option: mosr
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: A 2x mosr upscaling model for game textures
Subject: Game Textures
Input Type: Images
Release Date: 21.09.2024 (dd/mm/yy)
Dataset: Game Textures from Age of Mythology: Retold
Dataset Size: 13'847
OTF (on the fly augmentations): No
Pretrained Model: 4xNomos2_hq_mosr
Iterations: 510'000
Batch Size: 4
Patch Size: 64
Description:
In short: A 2x game texture mosr upscaling model, trained on and for (but not limited to) Age of Mythology: Retold textures.
Since I have been playing Age of Mythology: Retold (casual player), I thought it would be interesting to train an single image super resolution model on (and for) game textures of AoMR, but this model should be usable for other game textures aswell.
This is a 2x model, since the biggest texture images are already 4096x4096, I thought going 4x on those would be overkill (also there are already 4x game texture upscaling models, so this model can be used for similiar cases where 4x is not needed).
The pth, a static onnx conversion (since dysample used), and the training config files are provided in the Assets.
Model Showcase:
(Click on image for better view)
Process:
After extracting all the game's .bar files:
I ended up with 9067 textures files, which I tiled into 13'847 512x512px tiles, so the hr kinda looks like this (containing basecolor, normals, masks, etc):
I then created a corresponding lr folder by using gaussian blur,
quantization (floyd_steinberg, jarvis_judice_ninke, stucki, atkinson, burkes, sierra, two_row_sierra, sierra_lite) ,
compression (jpg),
downscaling (down_up, linear, cubic_mitchell, lanczos, gauss, box),
and then later on added bc1 compression (based on kim's suggestion, which improved the quality of this model) using nvcompress.
I trained a mosr model for 550k iterations, and based on iqa scoring, went with the 510k checkpoint as release candidate:
Test
PS just as a test, I tried this model out on the game itself.
I extracted all the Greek Texture files as tga files, upscaled there with chaiNNer with using the onnx conversion, which took 7 hours and 17 min to complete:
And the converted them back to .ddt files and replaced the in-game texture files using a mod folder (so like a mod basically, replaces the game files):
And then tested it out in the in-game Editor placing some buildings and units and a fixed camera:
But as can be seen, there were artifacts when replacing all the texture files, so I tried out only replacing the baseColor files (instead of Normals, Masks etc):
Which resolved the artifacts, but in comparison with the default textures, it doesnt look better:
So just to make sure something actually happened, I tested out just inverting the colors on the town center basecolor texture:
Which shows me that replacing the textures worked.
So i assume that in this case either it doesnt make a difference because the textures used in age of mythology retold are already big enough (like I wrote, some of these are 4096x4096px already) so that it doesnt make a difference on this zoom level / the details we see in this in-game screenshot, or then I did not do it correctly (like values would need to be adjusted in some json file, or something like that, i have never modded a game really).
4xNomos2_hq_drct-l
4xNomos2_hq_drct-l
Scale: 4
Architecture: DRCT
Architecture Option: drct_l
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Upscaler
Subject: Photography
Input Type: Images
Release Date: 08.09.2024
Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: DRCT-L_X4
Iterations: 200'000
Batch Size: 2
Patch Size: 64
Description:
An drct-l 4x upscaling model, similiar to the 4xNomos2_hq_atd, 4xNomos2_hq_dat2 and 4xNomos2_hq_mosr models, trained and for usage on non-degraded input to give good quality output.
Model Showcase:
Lexica2 Dataset
Lexica2 Dataset
A dataset to train single image super resolution models specifically on/for SD/AI generated images.
When releasing my older datasets I also saw the quality and size of the Lexica dataset and thought I would rework the dataset into a version 2. So this should be the higher quality with less images, consisting now of 6'500 512x512 image tiles - retiled, dejpg'd, deblurred, scored, filtered.
Dataset Name: Lexica2
Num images: 6'500 512x512
Purpose: For training upscalers of SD/AI generated images.
Based on Lexica Aperture V3 Images
Example of HR images from the dataset:
hyperiqa score: 0.7842237427784846
The tar files include the hr folder with 6’500 images.
4xNomos2_hq_atd
4xNomos2_hq_atd
Scale: 4
Architecture: ATD
Architecture Option: atd
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Upscaler
Subject: Photography
Input Type: Images
Release Date: 05.09.2024
Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: 003_ATD_SRx4_finetune
Iterations: 180'000
Batch Size: 2
Patch Size: 48
Norm: true
Description:
An atd 4x upscaling model, similiar to the 4xNomos2_hq_dat2 or 4xNomos2_hq_mosr models, trained and for usage on non-degraded input to give good quality output.
Training checkpoints metric scoring on val images
Showcase of the top 3 checkpoints from this model training, where 180k has been selected as the main release model: https://slow.pics/c/ZEnoG0Ou
I added the other checkpoints (135k and 205k) as additional model files in the assets of this release.
Model Showcase:
4xNomos2_hq_mosr
4xNomos2_hq_mosr
Scale: 4
Architecture: MoSR
Architecture Option: mosr
Author: Philip Hofmann
License: CC-BY-0.4
Purpose: Upscaler
Subject: Photography
Input Type: Images
Release Date: 25.08.2024
Dataset: nomosv2
Dataset Size: 6000
OTF (on the fly augmentations): No
Pretrained Model: 4xmssim_mosr_pretrain
Iterations: 190'000
Batch Size: 6
Patch Size: 64
Description:
A 4x MoSR upscaling model, meant for non-degraded input, since this model was trained on non-degraded input to give good quality output.
If your input is degraded, use a 1x degrade model first. So for example if your input is a .jpg file, you could use a 1x dejpg model first.
PS I also provide an onnx conversion in the Attachements, I verified correct output with chainner: