Skip to content

Lion: Kindling Vision Intelligence within Large Language Models

Notifications You must be signed in to change notification settings

mynameischaos/Lion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Lion🦁️

Lion: Kindling Vision Intelligence within Large Language Models. Codes and details will be released soon.

framework

  • Visual Encoder: ViT-G/14 from EVA-CLIP
  • Q-Former: ChineseBERT
  • LLM: InternLM 7B
  • Flamingo: Gating Cross Attention + FFN

Demo: demo link (Recommend using English for questioning)

MME Benchmark

  • MME,a comprehensive benchmark for multimodal large language models evaluation. MME evaluates perception and cognition abilities through 14 subtasks: existence, count, position, color, poster, celebrity, scene, landmark, artwork, OCR, commonsense reasoning, numerical calculation, text translation, and code reasoning.

  • We achieves SOTAs on overall/perception/cognition performance evaluation.

Overall Performance

Rank Model Version Score
1 Ours Ours 1991.5
2 InternLM-XComposer-VL InternLM-7B 1919.5
3 Qwen-VL-Chat Qwen-7B 1848.3
4 MMICL FlanT5xxl 1810.7
5 Skywork-MM Skywork-MM-13B 1775.5

About

Lion: Kindling Vision Intelligence within Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •