add basic llama 3.2 vision support #12163

MeouSker77 · 2024-10-08T02:30:46Z

Description

add basic llama 3.2 vision support

1. Why the change?

2. User API changes

requires transformers >= 45.0

import requests
import time
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

from ipex_llm import optimize_model

model_path = "Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(model_path)
model = optimize_model(model, modules_to_not_convert=["multi_modal_projector"])
model = model.half().eval()
model = model.to('xpu')
# print(model)

processor = AutoProcessor.from_pretrained(model_path)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe image in detail"}
        ]
    }
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)

img = "view.jpg"
raw_image = Image.open(img)

inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)

with torch.inference_mode():
    for i in range(3):
        st = time.time()
        output = model.generate(**inputs, do_sample=False, max_new_tokens=64)
        et = time.time()
        print(et - st)
print(processor.decode(output[0]))

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

HumerousGorgon · 2024-10-08T07:34:49Z

With this kind of implementation, will this mean that the vLLM version, for example, will be updated to the version with official support for 3.2 vision models?

MeouSker77 · 2024-10-08T08:14:43Z

With this kind of implementation, will this mean that the vLLM version, for example, will be updated to the version with official support for 3.2 vision models?

I'm not sure about the vLLM support, you can open an issue for it.

add basic llama 3.2 vision support

1ed8dde

MeouSker77 requested a review from rnwang04 October 8, 2024 02:40

rnwang04 approved these changes Oct 8, 2024

View reviewed changes

MeouSker77 merged commit 644af2a into intel-analytics:main Oct 8, 2024
1 check passed

MeouSker77 deleted the add-llama3.2-vision-support branch October 8, 2024 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add basic llama 3.2 vision support #12163

add basic llama 3.2 vision support #12163

MeouSker77 commented Oct 8, 2024 •

edited

Loading

HumerousGorgon commented Oct 8, 2024

MeouSker77 commented Oct 8, 2024

add basic llama 3.2 vision support #12163

add basic llama 3.2 vision support #12163

Conversation

MeouSker77 commented Oct 8, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

HumerousGorgon commented Oct 8, 2024

MeouSker77 commented Oct 8, 2024

MeouSker77 commented Oct 8, 2024 •

edited

Loading