Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
-
Updated
Aug 18, 2024 - Python
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Yet another im2txt (show and tell: A Neural Image Caption Generator)
Generate captions from images
Comparitive analysis of image captioning model using RNN, BiLSTM and Transformer model architectures on the Flickr8K dataset and InceptionV3 for image feature extraction.
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
Add a description, image, and links to the flickr8k-dataset topic page so that developers can more easily learn about it.
To associate your repository with the flickr8k-dataset topic, visit your repo's landing page and select "manage topics."