Famous Vision Language Models and Their Architectures
-
Updated
Sep 8, 2024 - Markdown
Famous Vision Language Models and Their Architectures
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
【grps接入trtllm】通过接入TensorRT-LLM以及Tokenizers.cpp实现纯c++版本高性能LLM服务,兼容OpenAI接口协议,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
Mark web pages for use with vision-language models
Qwen-VL base model for use with Autodistill.
Add a description, image, and links to the qwen-vl topic page so that developers can more easily learn about it.
To associate your repository with the qwen-vl topic, visit your repo's landing page and select "manage topics."