Several useful scripts for use with fast.ai
lectures and libraries.
image_download
is the primary function. It provides easy download of images from bing
, google
, baidu
, and/or flickr
(though the later requires an apikey
). It is intended for direct import of images within a python script or Jupyter Notebook.
make-train-valid
makes a train-valid directory and randomly copy files from labels_dir to sub-
directories. It has largely been replaced by the capabilities within fastai
but is still useful.
pip install icrawler
pip install python-magic
orpip install python-magic-bin
git clone https://github.com/prairie-guy/ai_utilities.git
Downloads up to a n_images
(typically limited to 100-300) from a specified search engine, including bing
, baidu
and flickr
. The search_text
can be different from its label
. Images are checked to be valid images and duplicates are eliminated. Images are saved to the directory dataset
by defalult. (Based upon the excellent work of: https://github.com/hellock/icrawler)
usage: image_download(search_text:Path, n_images, label:str=None, engine:str='bing', image_dir='dataset', apikey=None)
where, 'engine' = ['bing'|'google'|baidu'|'flickr'],
'flickr' requires an apikey and
'label' can be different from 'search_text'
Download up to 100 images of each class
, check each file to be a valid jpeg
image, save to directory dataset
and create data = ImageDataBunch.from_folder(...)
. Optionally create an imagenet-type directory structure.
import sys
sys.path.append('your-parent-directory-of-ai_utilities')
from ai_utilities import *
from pathlib import Path
from fastai.vision.all import *
for p in ['dog', 'goat', 'sheep']:
image_download(p, 100)
path = Path.cwd()/'dataset'
data = ImageDataLoaders.from_folder(path,valid_pct=0.2, item_tfms=Resize(224))
# Optionally, create an imagenet-type file directory.
make_train_valid(path)
data = ImageDataLoaders.from_folder(path, train='train', valid='valid', item_tfms=Resize(224))
From a directory containing sub-directories, each with a different class of images, make an imagenet-type directory structure.
It randomly copies files from labels_dir
to sub-directories: train
, valid
, test
. Creates an imagmenet-type directory usable by ImageDataBunch.from_folder(dir,...)
usage: make_train_valid(labels_dir:Path, train:float=.8, valid:float=.2, test:float=0)
positional arguments:
labels_dir Contains at least two directories of labels, each containing
files of that label
optional arguments:
train=.8 files for training, default=.8
valid=.2 files for validation, default=.2
test= 0 files for training, default=.0
For example, given a directory:
catsdogs/
..cat/[*.jpg]
..dog/[*.jpg]
Creates the following directory structure:
catsdogs/
..cat/[*.jpg]
..dog/[*.jpg]
..train/
..cat/[*.jpg]
..dog/[*.jpg]
..valid/
..cat/[*.jpg]
..dog/[*.jpg]