Skip to content

[IEEE TIP 2024] Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Notifications You must be signed in to change notification settings

lcysyzxdxc/MISC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MISC

The official repo for MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Dependency

GPT-4 Vision

CLIP_Surgery

Stable Diffusion 2.1

DiffBIR

CompressAI

Instruction

Download weights and put them into the weight folder:

DiffBIR (general_full_v1.ckpt): link Cheng2020-Tuned (cheng_small.pth.tar): link

If you want to use 'mask', download the CLIP_Surgery model. Put the `clip' folder in the same directory as this project.

Run the ipynb code in different modes to decompress the image!

  1. If you want pixel-instructed decoding, set the mode as 'pixel', a larger `block_num_min' means more pixels, with a larger bpps cost.

  2. If you want net-instructed decoding, set the mode as 'net' to use our fine-tuned Cheng-2020 net. You can also use your own net weight trained by CompressAI.

  3. If you want to use other models (like VVC, HiFiC, ...) as the starting point of diffusion, set the mode as 'ref', run your own model, and give the decompressed image and the bpps of your model.

Demo

[Feb 29, 2024] A simple Jupyter demo is uploaded. The encoder and decoder model weights will be uploaded soon.

[Apr 24, 2024] The model weights are uploaded. Please follow the instruction when using the ipynb file. We are working on a pipeline for en/decoding a group of image.

Visualzation Result

Citation

If you find our work useful, please cite our paper as:

@misc{li2024misc,
      title={MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model}, 
      author={Chunyi Li and Guo Lu and Donghui Feng and Haoning Wu and Zicheng Zhang and Xiaohong Liu and Guangtao Zhai and Weisi Lin and Wenjun Zhang},
      year={2024},
      eprint={2402.16749},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[IEEE TIP 2024] Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published