Skip to content

xmba15/ai4eo_map_your_city

Repository files navigation


Challenge Summary

Participants of this challenge are asked to develop a multi-modal model to estimate the construction year of any given building from 3 modality inputs: street-view imagery, VHR resolution top-view imagery, Medium Resolution S2 imagery. For half of the test set, street-view imagery will be missing, so the developed solution should address the issue of missing modality.

My solution includes modeling two classification models: type I where training and inference are conducted on all modalities, type II where training is done on all modalities, but inference is conducted only on top-view imageries. The second model is inspired by the Shared-Specific Feature Modelling approach in 3.

🎛 Development environment


mamba env create --file environment.yml
mamba activate ai4eo

💎 References


  1. Model Fusion for Building Type Classification from Aerial and Street View Images
  2. Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network
  3. Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling