Purpose

One or more utils for various VLMs (Visual Language Models)

ilm-7b-q_batch.py

Similar to others, but takes less than 12 GB of vram, and is also the fastest to run

Includes a commented-out prompt to focus on detecting text in the image, which is good to find watermarks, etc. In "find the text" mode, a 4090 processes around 1 image a second.

ilm-7b_batch.py

Use it like cog_vlm.py below. Note that it generates ".ilm7" files instad of ".txt" files Also, it requires 24GB rather than 12 GB of vram

ilm-2b_batch.py

Use it like cog_vlm.py below. Note that it generates ".ilm" files instad of ".txt" files Also, it requires 16GB rather than 12 GB of vram Not sure why anyone would want to use this version instead of the 7gb quantized, but if you would like to try it out, here is how you can do so.

cog_vlm.py

Note that you must have at least 12 GB vram to use

cog_vlm.py purpose

This can be used for multiple related purposes:

Autogenerating .txt descriptions for images
Detecting if there is an artists signature or watermark in an image
Determining if an image has nsfw content

See the script comments for alternative prompts that may help

llava-batch.py

This can be configured to use either the 7b, 13b, or 32b LLAVA model Read the comments at the top of the file for details. This is slow, but potentially generates the most detailed and accurate output, with the 32b model.

Install

python -m venv venv
. venv/bin/activate
pip install -r requirements.txt

Run

python cog_vlm.py

Typical Use

# The program expects to read in filenames, one per line,
# from stdin. This is a quick way to make that work

ls *.jpg  .....  | col |python cog_vlm.py

# Note that the first run will take a VERY LONG TIME, because
# it will download a buncha stuff from hugging face.
# but after that, it should start up after a few seconds.
# There is still a loading penalty, however, so it is
# most efficient to do more than one file at a time

Filtering tip

It can be very useful for filtering images.

If you want to do this in a directory that already has some ".txt" files, then modify the script to output to ".desc" instead of ".txt"

eg: txt_filename = f"{filename}.desc"

Once you have done this, then you can generate a list of image files to potentially remove, with something like:

egrep -l 'watermark|artist.s signature|comic panel|blahblah' *desc |sed s/desc/jpg/

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
README.md		README.md
cog_requirements.txt		cog_requirements.txt
cog_vlm.py		cog_vlm.py
ilm-2b_batch.py		ilm-2b_batch.py
ilm-7b-q_batch.py		ilm-7b-q_batch.py
ilm-7b_batch.py		ilm-7b_batch.py
ilm_requirements.txt		ilm_requirements.txt
llava-batch.py		llava-batch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

ilm-7b-q_batch.py

ilm-7b_batch.py

ilm-2b_batch.py

cog_vlm.py

cog_vlm.py purpose

llava-batch.py

Install

Run

Typical Use

Filtering tip

About

Releases

Packages

Languages

ppbrown/vlm-utils

Folders and files

Latest commit

History

Repository files navigation

Purpose

ilm-7b-q_batch.py

ilm-7b_batch.py

ilm-2b_batch.py

cog_vlm.py

cog_vlm.py purpose

llava-batch.py

Install

Run

Typical Use

Filtering tip

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages