Participants of this challenge are asked to develop a multi-modal model to estimate the construction year of any given building from 3 modality inputs: street-view imagery, VHR resolution top-view imagery, Medium Resolution S2 imagery. For half of the test set, street-view imagery will be missing, so the developed solution should address the issue of missing modality.
My solution includes modeling two classification models: type I where training and inference are conducted on all modalities, type II where training is done on all modalities, but inference is conducted only on top-view imageries. The second model is inspired by the Shared-Specific Feature Modelling approach in 3.
mamba env create --file environment.yml
mamba activate ai4eo