The official implementation of the paper.
Ranking-aware Adapter aims to adapt the pre-trained model for text-guided image ranking. By leveraging a special designed relational attention, we extract the text-conditioned visual distinction from image pairs as an additional supervision for boosting the ranking performance. The results demonstrate that this light-weighted adapter with the ranking-aware module enables a pre-trained CLIP model to support various image ranking tasks across domains, including object count sorting, image quality assessment, and facial age estimation.
Ranking-aware adapter for text-driven image ordering with CLIP
We have the plan to release the pre-trained model and related code. Stay tuned!