-
Notifications
You must be signed in to change notification settings - Fork 14
Stage 1: Frame Extraction and Similar Image Removal
Extract thousands of frames per episode of 24 minutes and remove similar images
- If you start from this stage, please set
--src_dir
to the folder containing either the videos (forscreenshots
pipelines) or the raw images (forbooru
pipeline). - Output folder:
/path/to/dataset_dir/intermediate/{image_type}/raw
- After this stage, you can go over the images to select those you want to keep.
Requirements: ffmpeg
with mpdecimate filter; cuda support is optional but highly recommended
This part is specific to the screenshots
pipeline. For the moment being, I directly rely on calling ffmpeg
command like
file_pattern="${dst_ep_dir}/${prefix}EP$((ep_init+i))_%d.png"
ffmpeg -hwaccel cuda -i $filename -filter:v \
"mpdecimate=hi=64*200:lo=64*50:frac=0.33,setpts=N/FRAME_RATE/TB" \
-qscale:v 1 -qmin 1 -c:a copy "$file_pattern"
The use of mpdecimate filter removes consecutive frames that are too similar. Enabling the filter makes a big difference in both processing speed and the number of extracted frames (probably 10 times fewer images with the filter). An even more aggressive approach would be to keep only the key frames.
-
extract_key
: Extract key frames instead of using mpdecimate filter.
Example usage: --extract_keys -
image_prefix
: This allows you to give a prefix to the extracted images. If not provided the prefix corresponding to each video is inferred from file name.
Example usage: --image_prefix haifuri -
ep_init
: The episode number to start from. Since the processing order is obtained by sorting the filenames, the episode given here could be different from the actual on. If not provided the episode number of each video is inferred from file name.
Example usage: --ep_init 4
This step uses 'mobilenetv3_large' to detect and remove similar images, reduces dataset size by a factor of 2 to 10 depending on the anime.
Since the extracted images can take a lot of place, I have made the decision to combine the two steps and perform removal for each episode independently after its frames are extracted. A final pass is also performed on all the extracted images at the end to remove repeated images in notably OP and ED.
-
no_remove_similar
: Put this argument to skip this step.
Example usage: --no_remove_similar -
similar_thresh
: The threshold above which we judge that two images are too similar. Default is 0.95.
Example usage: --similar_thresh 0.9 -
detect_duplicate_model
: You can use any other model fromtimm
for duplicate detection.
Example usage: --detect_duplicate_model mobilenetv2_100 -
detect_duplicate_batch_size
: Batch size for computing the features of images that are used for duplicate detection.
Example usage: --detect_duplicate_batch_size 32
- Home
- Dataset Organization
- Main Arguments
- Organization of the Character Reference Directory
- Start Training
- Conversion Scripts
- Anime and fanart downloading
- Frame extraction and similar image removal
- Character detection and cropping
- Character classification
- Image selection and resizing
- Tagging, captioning, and generating wildcards and embedding initialization information
- Dataset arrangement
- Repeat computation for concept balancing