Skip to content

This is the pytorch implement of our paper "CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset"

License

Notifications You must be signed in to change notification settings

Meize0729/CCExpert

Repository files navigation

CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset

Mingze Wang1†, Zhiming Wang1†, Sheng Xu1, Yanjing Li1 and Baochang Zhang1*

1Beihang University
Equal Contribution, *Corresponding Author

GitHub stars license arXiv

Introduction

This repository is the code implementation of the paper CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset, which is based on the LLaVA-NeXT project.

The current branch has been tested under PyTorch 2.3.0 and CUDA 12.1, supports Python 3.9+, and is compatible with most CUDA versions.

If you find this project helpful, please give us a star ⭐️, your support is our greatest motivation.

Update Log

🌟 2024.11.18 Release the arXiv version of the paper

🌟 2024.11.12 Release the weight, training logs and infer results.

🌟 2024.11.07 Release the source code and part of CC-Foundation Dataset.

Table of Contents

1. Installation

Dependencies

  • Linux
  • Python 3.9+, recommended 3.9
  • PyTorch 2.3 or higher, recommended 2.3
  • CUDA 12.1 or higher, recommended 12.1
  • transformers 4.40.1 or higher, recommed 4.40.1

Install CCExpert

Download or clone the CCExpert repository.

git clone git@github.com:Meize0729/CCExpert.git
cd CCExpert

Environment Installation

We recommend starting to configure the environment from a Linux machine with cuda12.1 and cudnn9.4.0. Then you need to install openjdk-11-jdk for evaluation metrics.

Next, you can install the relevant environment through the script we have prepared.

bash env_prepare.sh

2. Dataset Preparation

At CCExpert, we proposed a large dataset named "CC-Foundation Dataset" for change captioning services.

The first step you need to take is to download the relevant dataset in Baidu NetDisk (The access code: ccmz). However, I am very sorry that I can only open source a very small part of the data. If you have a special need, please get in touch with me. The subsequent data will be fully open sourced after the next piece of work.

Please note that when using these datasets, please follow the licenses of respective datasets!!! Next, process the downloaded data according to the following steps.~

Step 0: Download CC-Foundation in Baidu NetDisk, unzip the compressed package therein.

Step 1: Use the add_sbsolute_path_to_all_json.py script included in CC-Foundation to generate a subfolder that contains all the json data files that will be used. At the same time, the image path will change from a relative path to an absolute path.

# "{CC_Foundation_Local_Absolute_Path} stores the absolute path for saving this dataset for you."
export CC_Foundation_Local_Absolute_Path=/your/local/absolute/path
python3 ${CC_Foundation_Local_Absolute_Path}/add_sbsolute_path_to_all_json.py ${CC_Foundation_Local_Absolute_Path}

Step 2: This locationstores several yaml files, which correspond to CPT data, benchmark training set and test set template respectively. You need to usescripts/CCExpert_data_scripts/add_CC_Foundation_local_absolute_path_to_yaml.pyto add the absolute path stored by CC-Foundation to these template yaml files to facilitate finding the corresponding json annotation files.

python3 ./scripts/CCExpert_data_scripts/add_CC_Foundation_local_absolute_path_to_yaml.py \
    --yaml_file="./scripts/CCExpert_data_scripts/cptdata_RSupsampled_template.yaml" \
    --base_path="${CC_Foundation_Local_Absolute_Path}"
python3 ./scripts/CCExpert_data_scripts/add_CC_Foundation_local_absolute_path_to_yaml.py \
    --yaml_file="./scripts/CCExpert_data_scripts/benchmark_LEVIR-CC_train_template.yaml" \
    --base_path="${CC_Foundation_Local_Absolute_Path}"
python3 ./scripts/CCExpert_data_scripts/add_CC_Foundation_local_absolute_path_to_yaml.py \
    --yaml_file="./scripts/CCExpert_data_scripts/benchmark_LEVIR-CC_test_template.yaml" \
    --base_path="${CC_Foundation_Local_Absolute_Path}"

3. Model Training

The entire training process involves: Run Baseline ➡️ Run CCExpert.

All the code supports "training while testing" and can automatically print relevant indicators in the log. At the same time, the checkpoint with the best Sm indicator can be automatically saved.

Run Baseline

We have written the training script. As long as you follow the above steps and pay attention to the operations, you only need to open the corresponding script and adjust the NUM_GPUS and WORK_DIR you are using. We strongly recommend running the baseline first and then running CCExpert.

# Train LLaVA_OneVision_0.5b in LEVIR-CC benchmark
bash ./scripts/CCExpert_train_scripts/LLaVA_OneVision_0.5b_LEVIR-CC_baseline.sh

# Train LLaVA_OneVision_7b in LEVIR-CC benchmark
bash ./scripts/CCExpert_train_scripts/LLaVA_OneVision_7b_LEVIR-CC_baseline.sh

Run CCExpert

We believe that you have run through the baseline. Here it is a bit more complicated. We have included three stages of training in one script. Similarly, you only need to adjust the NUM_GPUS and WORK_DIR you are using.

# Train CCExpert_0.5b CPT + SFT
bash ./scripts/CCExpert_train_scripts/CCExpert_0.5b_3stage_cpt_sft.sh

# Train CCExpert_7b CPT + SFT
bash ./scripts/CCExpert_train_scripts/CCExpert_7b_3stage_cpt_sft.sh

4. Model Testing

After you obtain the best checkpoint, you can use the following script for additional testing and inference.

# Single-Card Eval, The script can automatically save prediction results and metric results.
python3 ./scripts/CCExpert_infer_eval_scripts/eval_CCExpert.py \
        --model_name "llava_qwen_cc" \
        --model_path "${Your Checkpoint absolute path}" \
        --out_path "${Your Checkpoint absolute path or Any path you like}" 

# Multi-Card Eval, The script can automatically save prediction results and metric results.
python3 ./scripts/CCExpert_infer_eval_scripts/eval_CCExpert.py \
        --model_name "llava_qwen_cc" \
        --model_path "${Your Checkpoint absolute path}" \
        --out_path "${Your Checkpoint absolute path or Any path you like}" \
        --DDP \
        --world_size 8

# Single-Card Infer, You need to simply modify the path of the script and so on. It is very simple.
python3 ./scripts/CCExpert_infer_eval_scripts/infer_CCExpert.py

Model Zoo

Model BLEU-4 METEOR ROUGEL CIDEr-D Sm* Weight Log Infer Results
CCExpert-0.5b 65.42 41.33 75.93 141.19 80.99 Baidu Disk 80.99 -
CCExpert-7b 65.49 41.82 76.55 143.32 81.80 Baidu Disk 81.80 url

Common Problems

Acknowledgement

This project is developed based on the LLaVA-NeXT. Thanks to the developers of these projects.

Citation

If you use the data, code, performance benchmarks and pre-trained weights of this project in your research, please refer to the bibtex below to cite CCExpert.

@misc{wang2024ccexpert,
      title={CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset}, 
      author={Zhiming Wang and Mingze Wang and Sheng Xu and Yanjing Li and Baochang Zhang},
      year={2024},
      eprint={2411.11360},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.11360}, 
}

License

This project is licensed under the Apache 2.0 license.

Contact

If you have any other questions❓, please contact wmz20000729@buaa.edu.cn in time 👬.

I apologize, the code we provide has not been thoroughly optimized. We will continue to refine it.

We will certainly do our utmost to assist you, and your inquiries will also contribute significantly to the optimization of this project.

About

This is the pytorch implement of our paper "CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published