SeaSalt Downloder is a modular downloader for any site you can make a module for.
- Supports custom websites
- Supports custom filters
- Supports custom saving methods
- Synchronous and Asynchronous download
- Scrapers
- Danbooru (
danbooru
) - Safebooru (
safebooru
)
- Danbooru (
- Filters
- No Filter (
no_filter
) - Tag filter (
tag_filter
)
- No Filter (
- Preprocessors
- Crop to aspect ratio (
crop_aspect
) - Crop to aspect ratio and resize (
crop_aspect_resize
) - Resize image by stretching (
resize
)
- Crop to aspect ratio (
- Savers
- Save to folder (
folder
)
- Save to folder (
python main.py -u <url> \
--scraper <scraper name> optional<scraper args> \
--filter <filter name> optional<filter args> \
--saver <saver name> optional<saver args>
For example, to scrape all images of Shirakami Fubuki from Safebooru, while filtering out any posts that have the 1girl
tag, and saving to a folder called Fubuki with friends
while discarding metadata:
python main.py -u "https://safebooru.org/index.php?page=post&s=list&tags=shirakami_fubuki+" \
--scraper safebooru \
--filter tag_filter 1girl \
--saver folder "Fubuki with friends"
SeaSalt also supports a powerful preprocessing feature. For example, you can crop all of your images to a square with
--preproc crop_aspect 1 1
The first number is the aspect ratio width, and the second the height. For example, you could crop to 16:9 with
--preproc crop_aspect 16 9
There is also an option to resize images after cropping. The following crops and resizes to a 512x512 square
--preproc crop_aspect_resize 1 1 512
512
is the width, and the height is calculated from the width
SeaSalt supports parallel downloading.
To use parallel downloading, pass the --parallel
arg
By default, it will set a batch size (the number of pages to download before distributing the tasks among the workers) to 10.
The default number of threads to use is equal to the CPU cores.
You can change this with the --batch_size
and --threads
arguments.
Each scraper must be a python file that contains a class Scraper
The scraper class must implement the following methods
def get_posts(self, url, parallel, args):
# URL is a page containing posts
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Must return a list of URLs to posts on the page
def get_post(self, url, parallel, args):
# URL is a URL to a post
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Return a tuple in which t[0] is a stream to the image
# and t[1] is the image metadata
#
# The metadata must be a dictionary containing the following items:
# 'image_name', the file name (no extension)
# 'ext', the file extension
# 'tags', the tags for the image
# Other metadata can be added to the dictionary, but it is not
# guaranteed support by existing modules
def next_page(self, url, parallel, args):
# URL is a URL to the current page
# parallel is if the user is parallel downloading
# args are the optional arguments provided by the user
# Return a URL to the next page
You are free to implement variables, imports, and other functions into your Scraper
class as well
Each filter must be a python file the contains a class Filt
The Filt
class must implement a method named filt
def filt(self, image, meta, args):
# image is an image stream
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the user
# return a tuple containing (image, meta)
# or return (None, None) to filter out the image
You are free to implement variables, imports, and other functions into your Filt
class as well
Each saver must be a python file that contains a class Save
Your save
class must implement a method named save
class Saver:
def save(self, image, meta, args):
# image is a stream to the image
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the user
You are free to implement variables, imports, and other functions into your Saver
class as well
Each preprocessor must be a python file that contains a class Processor
Your Processor
class must implement a method named process
class Processor:
def save(self, image, meta, args):
# image is a stream to the image
# meta is the image metadata (as defined in the scraper section)
# args are the optional arguments provided by the user
# must return a tuple containing a stream to the processed image and the meta (stream, meta)
You are free to implement variables, imports, and other functions into your Processor
class as well
Modules
- Zerochan scraper
- Pixiv scraper
- Sankaku scraper