Vision and Language Group@ MIL

All

14 repositories

imp
Public
a family of highly capabale yet efficient large multimodal models
Python
•
Apache License 2.0
•16•161•2•3•Updated Aug 23, 2024Aug 23, 2024
mlc-imp
Public
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Python
•
Apache License 2.0
•1.6k•6•0•0•Updated May 29, 2024May 29, 2024
anetqa
Public template
HTML
•1•0•0•0•Updated Mar 15, 2024Mar 15, 2024
anetqa-code
Public
Python
•
Apache License 2.0
•2•9•1•0•Updated Mar 7, 2024Mar 7, 2024
rosita
Public
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
vqa vision-and-language pre-training referring-expression-comprehension image-text-retrieval
Python
•
Apache License 2.0
•13•56•1•0•Updated Jun 13, 2023Jun 13, 2023
prophet
Public
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa
Python
•
Apache License 2.0
•27•266•0•0•Updated May 23, 2023May 23, 2023
bst
Public
Python
•
Apache License 2.0
•1•5•0•0•Updated May 12, 2023May 12, 2023
xmchat
Public
Apache License 2.0
•2•30•3•0•Updated Apr 24, 2023Apr 24, 2023
bottom-up-attention.pytorch
Public
A PyTorch reimplementation of bottom-up-attention models
bottom-up-attention detectron2 pytorch
Jupyter Notebook
•
Apache License 2.0
•75•292•26•0•Updated Apr 7, 2022Apr 7, 2022
openvqa
Public
A lightweight, scalable, and general framework for visual question answering research
benchmark deep-learning pytorch vqa visual-question-answering
Python
•
Apache License 2.0
•64•321•6•0•Updated Sep 3, 2021Sep 3, 2021
mcan-vqa
Public
Deep Modular Co-Attention Networks for Visual Question Answering
attention visual-reasoning visual-question-answering
Python
•
Apache License 2.0
•88•443•1•0•Updated Dec 16, 2020Dec 16, 2020
activitynet-qa
Public
An VideoQA dataset based on the videos from ActivityNet
vqa activitynet videoqa dataset
Python
•
Apache License 2.0
•9•67•2•0•Updated Nov 22, 2020Nov 22, 2020
mmnas
Public
Deep Multimodal Neural Architecture Search
Python
•
Apache License 2.0
•8•26•1•0•Updated Nov 15, 2020Nov 15, 2020
mt-captioning
Public
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
pytorch image-captioning multimodal-transformer
Python
•
Apache License 2.0
•7•24•1•1•Updated Sep 4, 2020Sep 4, 2020