Skip to content

Commit

Permalink
Update Phi-3 vision example and add Phi-3.5 vision example (#1049)
Browse files Browse the repository at this point in the history
### Description

This PR updates the Phi-3 vision example and adds a similar example for
Phi-3.5 vision.

### Motivation and Context

Now that ONNX Runtime v0.5.0 is released, the Phi-3 vision example needs
to be updated and a similar example for Phi-3.5 vision can be created.
  • Loading branch information
kunal-vaishnavi authored Nov 12, 2024
1 parent 83ddc3d commit e8cd6bc
Show file tree
Hide file tree
Showing 3 changed files with 137 additions and 39 deletions.
46 changes: 9 additions & 37 deletions examples/python/phi-3-vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,14 @@ $ cd phi3-vision-128k-instruct/pytorch
$ huggingface-cli download microsoft/Phi-3-vision-128k-instruct --local-dir .
```
Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3 vision ONNX repositories on Hugging Face. Here, let's use `microsoft/Phi-3-vision-128k-instruct-onnx-cpu` as the example ONNX repo.
### Download the modified PyTorch modeling files
Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3 vision ONNX repository on Hugging Face.

```bash
# Download modified files
$ cd ..
$ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cpu --include onnx/* --local-dir .
$ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx --include onnx/* --local-dir .
```

### Replace original PyTorch repo files with modified files
Expand All @@ -65,8 +66,7 @@ $ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cpu --inclu
$ rm pytorch/config.json
$ mv onnx/config.json pytorch/
# In our `modeling_phi3_v.py`, we replaced `from .image_embedding_phi3_v import Phi3ImageEmbedding`
# with `from .image_embedding_phi3_v_for_onnx import Phi3ImageEmbedding`
# In our `modeling_phi3_v.py`, we modified some classes for exporting to ONNX
$ rm pytorch/modeling_phi3_v.py
$ mv onnx/modeling_phi3_v.py pytorch/
Expand Down Expand Up @@ -103,47 +103,19 @@ $ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --executi

## 3. Build `genai_config.json` and `processor_config.json`

Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3 vision model. [Here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu/blob/main/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu/blob/main/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
### For DirectML
Replace
```json
"provider_options": []
```
in `genai_config.json` With
```json
"provider_options": [
{
"dml" : {}
}
]
```
### For CUDA
Replace
```json
"provider_options": []
```
in `genai_config.json` With
```json
"provider_options": [
{
"cuda" : {}
}
]
```
Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3 vision model. [Here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.

## 4. Run Phi-3 vision ONNX models

[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Phi-3 vision model with the ONNX Runtime generate() API.

### CUDA
```bash
$ python .\phi3v.py -m .\phi3-vision-128k-instruct\cuda
$ python .\phi3v.py -m .\phi3-vision-128k-instruct\cuda -p cuda
```

### DirectML

```bash
$ python .\phi3v.py -m .\phi3-vision-128k-instruct\dml
```
$ python .\phi3v.py -m .\phi3-vision-128k-instruct\dml -p dml
```
118 changes: 118 additions & 0 deletions examples/python/phi-3.5-vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Build your Phi-3.5 vision ONNX models for ONNX Runtime GenAI

## Steps
0. [Pre-requisites](#pre-requisites)
1. [Prepare Local Workspace](#prepare-local-workspace)
2. [Build ONNX Components](#build-onnx-components)
3. [Build ORT GenAI Configs](#build-genai_configjson-and-processor_configjson)
4. [Run Phi-3.5 vision ONNX models](#run-phi-3.5-vision-onnx-models)

## 0. Pre-requisites

Please ensure you have the following Python packages installed to create the ONNX models.

- `huggingface_hub[cli]`
- `numpy`
- `onnx`
- `onnxruntime-genai`
- For CPU:
```bash
pip install onnxruntime-genai
```
- For CUDA:
```bash
pip install onnxruntime-genai-cuda
```
- For DirectML:
```bash
pip install onnxruntime-genai-directml
```
- `pillow`
- `requests`
- `torch`
- Please install torch by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torch with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
- `torchvision`
- `transformers`

## 1. Prepare Local Workspace

Phi-3.5 vision is a multimodal model consisting of several models internally. In order to run Phi-3.5 vision with ONNX Runtime GenAI, each internal model needs to be created as a separate ONNX model. To get these ONNX models, some of the original PyTorch modeling files have to be modified.

### Download the original PyTorch modeling files

First, let's download the original PyTorch modeling files.
```bash
# Download PyTorch model and files
$ mkdir -p phi3.5-vision-instruct/pytorch
$ cd phi3.5-vision-instruct/pytorch
$ huggingface-cli download microsoft/Phi-3.5-vision-instruct --local-dir .
```
### Download the modified PyTorch modeling files
Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3.5 vision ONNX repository on Hugging Face.

```bash
# Download modified files
$ cd ..
$ huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include onnx/* --local-dir .
```

### Replace original PyTorch repo files with modified files

```bash
# In our `config.json`, we replaced `flash_attention_2` with `eager` in `_attn_implementation`
$ rm pytorch/config.json
$ mv onnx/config.json pytorch/
# In our `modeling_phi3_v.py`, we modified some classes for exporting to ONNX
$ rm pytorch/modeling_phi3_v.py
$ mv onnx/modeling_phi3_v.py pytorch/
# Move the builder script to the root directory
$ mv onnx/builder.py .
# Delete empty `onnx` directory
$ rm -rf onnx/
```

If you have your own fine-tuned version of Phi-3.5 vision, you can now replace the `*.safetensors` files in the `pytorch` folder with your `*.safetensors` files.

## 2. Build ONNX Components

Here are some examples of how you can build the components as INT4 ONNX models.

```bash
# Build INT4 components with FP32 inputs/outputs for CPU
$ python3 builder.py --input ./pytorch --output ./cpu --precision fp32 --execution_provider cpu
```

```bash
# Build INT4 components with FP16 inputs/outputs for CUDA
$ python3 builder.py --input ./pytorch --output ./cuda --precision fp16 --execution_provider cuda
```

```bash
# Build INT4 components with FP16 inputs/outputs for DirectML
$ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --execution_provider dml
```

## 3. Build `genai_config.json` and `processor_config.json`

Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3.5 vision model. [Here](https://huggingface.co/microsoft/Phi-3.5-vision-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3.5-vision-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.

## 4. Run Phi-3.5 vision ONNX models

[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Phi-3.5 vision model with the ONNX Runtime generate() API.

### CUDA
```bash
$ python .\phi3v.py -m .\phi3.5-vision-instruct\cuda -p cuda
```

### DirectML

```bash
$ python .\phi3v.py -m .\phi3.5-vision-instruct\dml -p dml
```
12 changes: 10 additions & 2 deletions examples/python/phi3v.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ def _complete(text, state):

def run(args: argparse.Namespace):
print("Loading model...")
model = og.Model(args.model_path)
config = og.Config(args.model_path)
config.clear_providers()
if args.provider != "cpu":
print(f"Setting model to {args.provider}...")
config.append_provider(args.provider)
model = og.Model(config)
processor = model.create_multimodal_processor()
tokenizer_stream = processor.create_stream()

Expand Down Expand Up @@ -73,7 +78,10 @@ def run(args: argparse.Namespace):
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"-m", "--model_path", type=str, required=True, help="Path to the model"
"-m", "--model_path", type=str, required=True, help="Path to the folder containing the model"
)
parser.add_argument(
"-p", "--provider", type=str, required=True, help="Provider to run model"
)
args = parser.parse_args()
run(args)

0 comments on commit e8cd6bc

Please sign in to comment.