Update Phi-3 vision example and add Phi-3.5 vision example (#1049)

### Description This PR updates the Phi-3 vision example and adds a similar example for Phi-3.5 vision. ### Motivation and Context Now that ONNX Runtime v0.5.0 is released, the Phi-3 vision example needs to be updated and a similar example for Phi-3.5 vision can be created.
microsoft · Nov 12, 2024 · e8cd6bc · e8cd6bc
1 parent 83ddc3d
commit e8cd6bc
Show file tree

Hide file tree

Showing 3 changed files with 137 additions and 39 deletions.
diff --git a/examples/python/phi-3-vision.md b/examples/python/phi-3-vision.md
@@ -49,13 +49,14 @@ $ cd phi3-vision-128k-instruct/pytorch
 $ huggingface-cli download microsoft/Phi-3-vision-128k-instruct --local-dir .
 ```
 
-Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3 vision ONNX repositories on Hugging Face. Here, let's use `microsoft/Phi-3-vision-128k-instruct-onnx-cpu` as the example ONNX repo.
-
 ### Download the modified PyTorch modeling files
+
+Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3 vision ONNX repository on Hugging Face.
+
 ```bash
 # Download modified files
 $ cd ..
-$ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cpu --include onnx/* --local-dir .
+$ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx --include onnx/* --local-dir .
 ```
 
 ### Replace original PyTorch repo files with modified files
@@ -65,8 +66,7 @@ $ huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cpu --inclu
 $ rm pytorch/config.json
 $ mv onnx/config.json pytorch/
 
-# In our `modeling_phi3_v.py`, we replaced `from .image_embedding_phi3_v import Phi3ImageEmbedding`
-# with `from .image_embedding_phi3_v_for_onnx import Phi3ImageEmbedding`
+# In our `modeling_phi3_v.py`, we modified some classes for exporting to ONNX
 $ rm pytorch/modeling_phi3_v.py
 $ mv onnx/modeling_phi3_v.py pytorch/
 
@@ -103,47 +103,19 @@ $ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --executi
 
 ## 3. Build `genai_config.json` and `processor_config.json`
 
-Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3 vision model. [Here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu/blob/main/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu/blob/main/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
-
-### For DirectML
-Replace
-```json
-"provider_options": []
-```
-in `genai_config.json` With
-```json
-"provider_options": [
-    {
-        "dml" : {}
-    }
-]
-```
-
-### For CUDA
-Replace
-```json
-"provider_options": []
-```
-in `genai_config.json` With
-```json
-"provider_options": [
-    {
-        "cuda" : {}
-    }
-]
-```
+Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3 vision model. [Here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
 
 ## 4. Run Phi-3 vision ONNX models
 
 [Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Phi-3 vision model with the ONNX Runtime generate() API.
 
 ### CUDA
 ```bash
-$ python .\phi3v.py -m .\phi3-vision-128k-instruct\cuda
+$ python .\phi3v.py -m .\phi3-vision-128k-instruct\cuda -p cuda
 ```
 
 ### DirectML
 
 ```bash
-$ python .\phi3v.py -m .\phi3-vision-128k-instruct\dml
-```
+$ python .\phi3v.py -m .\phi3-vision-128k-instruct\dml -p dml
+```
diff --git a/examples/python/phi-3.5-vision.md b/examples/python/phi-3.5-vision.md
@@ -0,0 +1,118 @@
+# Build your Phi-3.5 vision ONNX models for ONNX Runtime GenAI
+
+## Steps
+0. [Pre-requisites](#pre-requisites)
+1. [Prepare Local Workspace](#prepare-local-workspace)
+2. [Build ONNX Components](#build-onnx-components)
+3. [Build ORT GenAI Configs](#build-genai_configjson-and-processor_configjson)
+4. [Run Phi-3.5 vision ONNX models](#run-phi-3.5-vision-onnx-models)
+
+## 0. Pre-requisites
+
+Please ensure you have the following Python packages installed to create the ONNX models.
+
+- `huggingface_hub[cli]`
+- `numpy`
+- `onnx`
+- `onnxruntime-genai`
+    - For CPU:
+    ```bash
+    pip install onnxruntime-genai
+    ```
+    - For CUDA:
+    ```bash
+    pip install onnxruntime-genai-cuda
+    ```
+    - For DirectML: 
+    ```bash
+    pip install onnxruntime-genai-directml
+    ```
+- `pillow`
+- `requests`
+- `torch`
+    - Please install torch by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torch with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
+- `torchvision`
+- `transformers`
+
+## 1. Prepare Local Workspace
+
+Phi-3.5 vision is a multimodal model consisting of several models internally. In order to run Phi-3.5 vision with ONNX Runtime GenAI, each internal model needs to be created as a separate ONNX model. To get these ONNX models, some of the original PyTorch modeling files have to be modified.
+
+### Download the original PyTorch modeling files
+
+First, let's download the original PyTorch modeling files.
+
+```bash
+# Download PyTorch model and files
+$ mkdir -p phi3.5-vision-instruct/pytorch
+$ cd phi3.5-vision-instruct/pytorch
+$ huggingface-cli download microsoft/Phi-3.5-vision-instruct --local-dir .
+```
+
+### Download the modified PyTorch modeling files
+
+Now, let's download the modified PyTorch modeling files that have been uploaded to the Phi-3.5 vision ONNX repository on Hugging Face.
+
+```bash
+# Download modified files
+$ cd ..
+$ huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include onnx/* --local-dir .
+```
+
+### Replace original PyTorch repo files with modified files
+
+```bash
+# In our `config.json`, we replaced `flash_attention_2` with `eager` in `_attn_implementation`
+$ rm pytorch/config.json
+$ mv onnx/config.json pytorch/
+
+# In our `modeling_phi3_v.py`, we modified some classes for exporting to ONNX
+$ rm pytorch/modeling_phi3_v.py
+$ mv onnx/modeling_phi3_v.py pytorch/
+
+# Move the builder script to the root directory
+$ mv onnx/builder.py .
+
+# Delete empty `onnx` directory
+$ rm -rf onnx/
+```
+
+If you have your own fine-tuned version of Phi-3.5 vision, you can now replace the `*.safetensors` files in the `pytorch` folder with your `*.safetensors` files.
+
+## 2. Build ONNX Components
+
+Here are some examples of how you can build the components as INT4 ONNX models.
+
+```bash
+# Build INT4 components with FP32 inputs/outputs for CPU
+$ python3 builder.py --input ./pytorch --output ./cpu --precision fp32 --execution_provider cpu
+```
+
+```bash
+# Build INT4 components with FP16 inputs/outputs for CUDA
+$ python3 builder.py --input ./pytorch --output ./cuda --precision fp16 --execution_provider cuda
+```
+
+```bash
+# Build INT4 components with FP16 inputs/outputs for DirectML
+$ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --execution_provider dml
+```
+
+## 3. Build `genai_config.json` and `processor_config.json`
+
+Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Phi-3.5 vision model. [Here](https://huggingface.co/microsoft/Phi-3.5-vision-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/microsoft/Phi-3.5-vision-instruct-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
+
+## 4. Run Phi-3.5 vision ONNX models
+
+[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Phi-3.5 vision model with the ONNX Runtime generate() API.
+
+### CUDA
+```bash
+$ python .\phi3v.py -m .\phi3.5-vision-instruct\cuda -p cuda
+```
+
+### DirectML
+
+```bash
+$ python .\phi3v.py -m .\phi3.5-vision-instruct\dml -p dml
+```
diff --git a/examples/python/phi3v.py b/examples/python/phi3v.py
@@ -14,7 +14,12 @@ def _complete(text, state):
 
 def run(args: argparse.Namespace):
     print("Loading model...")
-    model = og.Model(args.model_path)
+    config = og.Config(args.model_path)
+    config.clear_providers()
+    if args.provider != "cpu":
+        print(f"Setting model to {args.provider}...")
+        config.append_provider(args.provider)
+    model = og.Model(config)
     processor = model.create_multimodal_processor()
     tokenizer_stream = processor.create_stream()
 
@@ -73,7 +78,10 @@ def run(args: argparse.Namespace):
 if __name__ == "__main__":
     parser = argparse.ArgumentParser()
     parser.add_argument(
-        "-m", "--model_path", type=str, required=True, help="Path to the model"
+        "-m", "--model_path", type=str, required=True, help="Path to the folder containing the model"
+    )
+    parser.add_argument(
+        "-p", "--provider", type=str, required=True, help="Provider to run model"
     )
     args = parser.parse_args()
     run(args)