ImageTextCoding

Official repository for the paper "LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Image Compression" (IEEE VCIP 2024)

Check out our presentation poster!

Description

This is the official repository for the paper "LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Image Compression". Full paper is available on arXiv.

Please feel free to contact Murai(octachoron(at)suou.waseda.jp), Sun Heming or post an issue if you have any questions.

Demo Inference on Google Colab

We prepare demo inference code for google colaboratory. You can check the inference without any environmental setting. Just click the 'Open in Colab' button above.

Requirements

Python 3.10 and some other packages are needed. Please refer to the How to Use section below. Our experiments and verification are conducted on Linux(Ubuntu 22.04) and Docker container with cuda=1.2.1 and torch=2.1.

How to Use

First, clone this repository.

git clone https://github.com/tokkiwa/TextImageCoding/
cd TextImageCoding

Download the DiffBIR weights and our pre-trained weights to the /weights folder and /lic-weights/cheng folder respectively.

The weights for DiffBIR is available at https://github.com/XPixelGroup/DiffBIR. We adopt 'v1_general' weights through our experiments.

Our pre-trained weight is avairable at this link. Please note that this is nightly release. All the weights for the experiment will be released soon.

Install requirements (using virtual environment is recommended).

pip install -r requirements.txt

Caption Generation and Compression

Codes for Caption Generation and Compression can be found in llavanextCaption_Compression.ipynb.

Inference

We prepare text caption for kodak image datasets. Please run

bash run_misc.sh

with necessary specification.

For other datasets, please generate and compress the caption by running llavanextCaption_Compression.ipynb and place the output csv to the df folder, and specify the dataset in run_misc.sh.

Training

Our training code is based on CompressAI. Please run lic/train.sh with specification of the models, datasets and parameters.

Ackownledgement

Our codes are based on MISC, CompressAI, GPTZip and DiffBIR. We thank the authors for releasing their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
clip		clip
configs		configs
dataset		dataset
ldm		ldm
lic		lic
model		model
utils		utils
ImageTextCoding.ipynb		ImageTextCoding.ipynb
README.md		README.md
kodak_llava_1.5.csv		kodak_llava_1.5.csv
kodim15.png		kodim15.png
llavanextCaption_Compression.ipynb		llavanextCaption_Compression.ipynb
requirements.txt		requirements.txt
run.sh		run.sh
run_misc.py		run_misc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageTextCoding

Description

Demo Inference on Google Colab

Requirements

How to Use

Caption Generation and Compression

Inference

Training

Ackownledgement

About

Releases

Packages

Contributors 2

Languages

tokkiwa/ImageTextCoding

Folders and files

Latest commit

History

Repository files navigation

ImageTextCoding

Description

Demo Inference on Google Colab

Requirements

How to Use

Caption Generation and Compression

Inference

Training

Ackownledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages