Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add basic llama 3.2 vision support #12163

Merged

Conversation

MeouSker77
Copy link
Contributor

@MeouSker77 MeouSker77 commented Oct 8, 2024

Description

add basic llama 3.2 vision support

1. Why the change?

2. User API changes

requires transformers >= 45.0

modified from official example

import requests
import time
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

from ipex_llm import optimize_model

model_path = "Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(model_path)
model = optimize_model(model, modules_to_not_convert=["multi_modal_projector"])
model = model.half().eval()
model = model.to('xpu')
# print(model)

processor = AutoProcessor.from_pretrained(model_path)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe image in detail"}
        ]
    }
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)

img = "view.jpg"
raw_image = Image.open(img)

inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)

with torch.inference_mode():
    for i in range(3):
        st = time.time()
        output = model.generate(**inputs, do_sample=False, max_new_tokens=64)
        et = time.time()
        print(et - st)
print(processor.decode(output[0]))

3. Summary of the change

4. How to test?

  • N/A
  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test
  • Document test
  • ...

@MeouSker77 MeouSker77 merged commit 644af2a into intel-analytics:main Oct 8, 2024
1 check passed
@MeouSker77 MeouSker77 deleted the add-llama3.2-vision-support branch October 8, 2024 02:46
@HumerousGorgon
Copy link

With this kind of implementation, will this mean that the vLLM version, for example, will be updated to the version with official support for 3.2 vision models?

@MeouSker77
Copy link
Contributor Author

With this kind of implementation, will this mean that the vLLM version, for example, will be updated to the version with official support for 3.2 vision models?

I'm not sure about the vLLM support, you can open an issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants