As repositories of large scale data in earth observation (EO) have grown, so have transfer and storage costs for model training and inference, expending significant resources. We introduce Neural Embedding Compression (NEC), based on the transfer of compressed embeddings to data consumers instead of raw data. We adapt foundation models (FM) through learned neural compression to generate multi-task embeddings while navigating the tradeoff between compression rate and embedding utility. We update only a small fraction of the FM parameters (10%) for a short training period (1% of the iterations of pre-training). We evaluate NEC on two EO tasks: scene classification and semantic segmentation. Compared with applying traditional compression to the raw data, NEC achieves similar accuracy with a 75% to 90% reduction in data. Even at 99.7% compression, performance drops by only 5% on the scene classification task. Overall, NEC is a data-efficient yet performant approach for multi-task EO modelling.
Environment:
- Python 3.10.4
- Pytorch 1.11.0
- torchvision 0.12.0
- timm 0.9.12
- mmcv-full 1.5.0
- compressai 1.2.4
- torchgeo 0.3.1
- yapf 0.33.0
pip install -e .
From the MAEPretrain_SceneClassification
directory.
-
(From https://github.com/ViTAE-Transformer/Remote-Sensing-RVSA): Preparing the MillionAID: Download the MillionAID. Here, we use previous
train_labels.txt
andvalid_labels.txt
of the ViTAE-Transformer-Remote-Sensing, which contain labels. However, since we conduct the unsupervised pretraining, the labels are not necessary. It is easy for users to record image names and revise corresponding codesMAEPretrain_SceneClassification/util/datasets.py/class MillionAIDDataset
. -
You can download the ViT-B pretrained model from https://github.com/ViTAE-Transformer/Remote-Sensing-RVSA.
-
Training:
We train on 4 A100 GPUs. For a single worker:
torchrun --standalone --nnodes=1 --nproc_per_node=<NUM_TRAINERS> main_compress.py --dataset 'millionAID' --model 'mae_vit_compress_adapter' --epochs 20 --warmup_epochs 0 --data_path <path_to_millionaid> --save_every_n_epochs 2 --num_workers 8 --ld 1e10 --finetune <path_to_pretrained_model> --output_dir <storage path> --log_dir <log storage path> --blr 1.5e-4 --weight_decay 0.05 --input_size 224 --batch_size 256
Use the find_average_size.py
script to find the compression metrics for the model on a given dataset.
python find_average_size.py --model 'mae_vit_compress_adapter' --model_path <path_to_model> --input_size <image_size> --dataset <ucm, MillionAid or potsdam> --entropy --data_path <data_path>
--eval_size
can be used to choose a subset of the data to evaluate on
--batch_size
determines the number of samples that are compressed jointly.
For the Potsdam dataset, the data path is ignored and instead the dataset used is the one specified in ../Semantic Segmentation/configs/vit_compress/potsdam_dataset.py
python main_finetune.py --dataset 'ucm' --data_path <path to dataset> --model 'vit_base_compressed' --epochs 400 --with_decoder --finetune <path to previously trained model weights> --input_size 256 --batch_size 32 --warmup_epochs 5 --blr 1e-3 --weight_decay 0.05 --split 20 --output_dir <storage path> --log_dir <log storage path>
Since we use MMSegmentation we only provide the necessary config and backbone files.
-
Make sure you have the required environment described above
-
git clone https://github.com/open-mmlab/mmsegmentation.git --branch v0.21.0
-
pip install -U openmim
-
Install mmcv-full with the cuda_version that matches your torch cuda version:
mim install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/{cuda_version}/{torch_version}/index.html
-
cd mmsegmentation
-
pip install .
Edit the potsdam_dataset.py
configuration file in configs/vit_compress
so the data_root points to your data directory.
Then put the files under Sementic Segmentation
in their respective directories for MMSegmentation.
For convenience, we preserve the relative path for users to find files.
For example, add the contents of Semantic Segmentation/mmseg/models/backbones
to mmsegmentation/mmseg/models/backbones
python tools/train.py configs/vit_compress/compress.py
Or one of the other configs for a different configuration.
Downstream comparison: a) Raw Data Compression (RDC); b) Uniformly Quantized Embeddings (UQE); c) Neural Embedding Compression (NEC, ours). "Learned" refers to entropy coding with the distribution learned during training.
Code for this repository is based on https://github.com/ViTAE-Transformer/Remote-Sensing-RVSA.
Please consider citing our work if it is useful for you
@ARTICLE{gomes_nec_2024,
author={Gomes, Carlos and Brunschwiler, Thomas},
journal={IEEE IGARSS},
title={Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling},
year={2024}
}
This repository is based on Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model and the original MAE repository.
Compression models are built using the CompressAI library.