A CLI tool for generating captions for images using Salesforce BLIP.
Install this tool using pip
or pipx
:
pipx install blip-caption
The first time you use the tool it will download the model from the Hugging Face model hub.
The small model is 945MB. The large model is 1.8GB. The models will be downloaded and stored in ~/.cache/huggingface/hub/
the first time you use them.
To generate captions for an image using the small model, run:
blip-caption IMG_5825.jpeg
Example output:
a lizard is sitting on a branch in the woods
To use the larger model, add --large
:
blip-caption IMG_5825.jpeg --large
Example output:
there is a chamelon sitting on a branch in the woods
Here's the image I used:
If you pass multiple files the path to each file will be output before its caption:
blip-caption /tmp/photos/*.jpeg
/tmp/photos/IMG_2146.jpeg
a man holding a bowl of salad and laughing
/tmp/photos/IMG_0151.jpeg
a cat laying on a red blanket
The --json
flag changes the output to look like this:
blip-caption /tmp/photos/*.* --json
[{"path": "/tmp/photos/IMG_2146.jpeg", "caption": "a man holding a bowl of salad and laughing"},
{"path": "/tmp/photos/IMG_0151.jpeg", "caption": "a cat laying on a red blanket"},
{"path": "/tmp/photos/IMG_3099.MOV", "error": "cannot identify image file '/tmp/photos/IMG_3099.MOV'"}]
Any errors are returned as a {"path": "...", "error": "error message"}
object.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd blip-caption
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest