🦙 llama2 optimization #641

trajepl · 2023-10-12T12:30:11Z

Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Format your code by running pre-commit run --all-files
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

examples/llama2/user_script.py

…/llama_converter

examples/llama2/user_script.py

olive/model/__init__.py

examples/llama2/user_script.py

examples/llama2/ort_converter_merged_llama2_cpu.json

examples/llama2/llama2.py

olive/passes/onnx/quantization.py

examples/llama2/README.md

examples/llama2/llama2_cpu.json

jambayk · 2023-10-27T20:02:21Z

olive/passes/onnx/transformer_optimization.py

+            if config["use_gqa"]:
+                # Replace MultiHeadAttention with GroupQueryAttention and remove attention mask nodes
+                num_kv_heads = model.model_attributes.get("num_key_value_heads", None)
+                if num_kv_heads is None:


Not a blocker: Should we make it possible to provide the value as a pass config parameter too?
I think there is an asymmetry that num_head and hidden_size can be pass config param or model attributes.
Also, we don't usually provide model_attributes manually for non-automatic workflows. Most examples have the pass config with the model specific values.

Yes, we can. But the GQA currently only works for llama-70B as the num_heads!=num_kv_heads.
We can complete the hf_mapping to introduce similar value for num_kv_heads.

jambayk · 2023-10-27T20:03:47Z

olive/passes/onnx/transformer_optimization.py

@@ -202,3 +226,36 @@ def _run_for_config(

        # save the model to the output path and return the model
        return model_proto_to_olive_model(optimizer.model, output_model_path, config)
+
+    @staticmethod
+    def _replace_mha_with_gqa(model: "OnnxModel", past_seq_len: str = "past_sequence_length", kv_num_heads: int = 0):


Is there a reason we don't use onnxruntime.transformers.convert_generation.replace_mha_with_gqa directly? Do we want this option to still be usable for older versions of onnxruntime on the host?

Yes, old version of ort do not contain the corresponding OPs, so that is not the problem.

The reason I was thinking: the scripts under convert generation is like temp-solutions, also as this part of logics is quite simple, I directly copy it here.

olive/passes/onnx/quantization.py

commit 093cfaf Merge: d074e17 649e314 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 30 18:20:05 2023 +0000 Merge branch 'main' into jambayk/lora commit d074e17 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 30 18:11:28 2023 +0000 merge main commit 649e314 Author: trajep <jiapli@microsoft.com> Date: Sat Oct 28 13:30:40 2023 +0800 🦙 llama2 optimization (#641) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 819c25a Author: Mike Guo <myguo@microsoft.com> Date: Thu Oct 26 13:43:31 2023 +0800 raise ValueError if the model_type is None (#666) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit b897b8c Author: Mike Guo <myguo@microsoft.com> Date: Thu Oct 26 11:41:40 2023 +0800 raise the known failure exceptions when do perf-tuning (#664) ## Describe your changes In perf-tuning, if the exception is known as ImportModuleError, we need not retry the next step. And we need throw the exception like in engine. ## Checklist before requesting a review - [x] Add unit tests for this change. - [x] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit a54dda8 Author: trajep <jiapli@microsoft.com> Date: Mon Oct 23 12:10:54 2023 +0800 🆓 Release GPU memory for torch model evaluation (#662) ## Describe your changes In llamav2 tests, torch model occupied the gpu memory which leads to error when evaluate onnx model. Release GPU memory for torch model evaluation model.to("cpu") could not release gpu memory in time. Call torch.cuda.empty_cache() to cleanup. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ac6d0f7 Author: Mike Guo <myguo@microsoft.com> Date: Mon Oct 23 09:04:29 2023 +0800 fix some pylint issues (#661) ## Describe your changes fix some pylint issues ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 01296bc Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Fri Oct 20 20:30:37 2023 -0700 Optional `evaluate_input_model` in no-search mode (#663) ## Describe your changes In `no-search` mode, the `evaluate_input_model=True` case is only used if an evaluator is provided. Otherwise, it requires the user to explicitly set `evaluate_input_model=False`. Now the input model evaluation behavior is the same as for the output model: evaluate if evaluator provided. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 91f1c45 Author: Mike Guo <myguo@microsoft.com> Date: Fri Oct 20 14:48:50 2023 +0800 remove pre-commit CI pipeline (#660) ## Describe your changes remove pre-commit CI pipeline ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ece52f7 Author: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Date: Thu Oct 19 23:01:01 2023 -0700 Update contributing doc (#642) ## Describe your changes Update contributing doc ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 4df44a5 Author: Mike Guo <myguo@microsoft.com> Date: Fri Oct 20 13:43:00 2023 +0800 add editorconfig rules & enable pylint (#659) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [x] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 2b5aef1 Author: trajep <jiapli@microsoft.com> Date: Thu Oct 19 10:48:56 2023 +0800 🚧 Consistent dataloader for benchmark (#657) ## Describe your changes Consistent dataloader for benchmark Transforms.RandomCrop/Clip will generate different tensors every time call dataloader creation function. It bring the inconsistent model accuracy measurement. This PR is a workaround used to ensure the input data is consist for torch model and onnx model. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ab70b6c Author: Gaurav Garg <52341457+gaugarg-nv@users.noreply.github.com> Date: Thu Oct 19 06:34:33 2023 +0530 Fix bugs in how free dimension session options are set: (#658) ## Describe your changes - The “unet_text_embeds_size” free dimension should always be overridden with a value of 1280 irrespective of image resolution - this value is same as the text_encoder_2 hidden dimension for both SD-XL base and refiner model. Currently using a resolution other than 1024 results in error. - With the optimum pipeline, the right keyword to pass session options is “session_options” and not “sess_options”. This change improves perf by 40% for 512x512 resolution and about 10% for 1024x1024 resolution. commit 42524f5 Author: trajep <jiapli@microsoft.com> Date: Wed Oct 18 18:34:36 2023 +0800 👁️‍🗨️ Disable qualcomm linkcheck (#655) ## Describe your changes Docs build failed with broken link from qualcomm. This pr is used to disable link check for qualcomm ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 133d5d5 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Tue Oct 17 23:05:54 2023 -0700 Text-gen: Optional attn mask. Loading args for hf model config. QLoRA: Update loading args (#654) ## Describe your changes Some enhancements to the QLoRA pass and related components. Example use case scenario is for fine-tuning [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5). In order to load the model and config for this model non-interactively, we have to pass `trust_remote_code=True` to `model_loading_args`. However, the QLoRA pass and `get_hf_model_config` both ignore it. There are also cases where we have to provide `token` for models like [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b). Both components have been updated to use the model_loading_args correctly. `microsoft/phi-1_5` gives a warning about `attention_mask` during training. Text-gen preprocessing is updated to make `attention_mask` optional (opt out if needed). Bug Fix: Check `pad_token_id` for `None` since `not pad_token_id` is `True` for `pad_token_id == 0`. Renamed `dataset_name` to `data_name` in qlora example to use consistent naming. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 169ffda Author: trajep <jiapli@microsoft.com> Date: Wed Oct 18 13:31:41 2023 +0800 🍮 Detect Data Component Reserved Arguments with Position info (#636) ## Describe your changes 1. Fix arguments for customized data components. First arguments for pre/post process and dataloader must starts with `_`. 2. Update data configs documents. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 0bc3d5d Author: Xiao Sheng <shengxiao0319@126.com> Date: Tue Oct 17 18:34:23 2023 +0800 Update Vitis AI quantization to support ORT 1.16, support TensorData and QuantizationParams (#650) ## Describe your changes Update Vitis AI quantization to support ORT 1.16, support TensorData and QuantizationParams ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link #629 commit efd83e0 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 16 02:10:52 2023 -0700 Fix `quant_preprocess` import in VitisAI Quantization pass (#648) ## Describe your changes Follow up to #639 that missed this pass. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit e7eae08 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Sun Oct 15 22:16:08 2023 -0700 ResNet: Update seed_everything and accuracy goal (#646) ## Describe your changes Updated the seed steps to make the model consistent accross runs. Update the accuracy goal in `resnet_ptq_cpu_aml_dataset.json` to increase the tolerance. This was not increased last time along with `r`resnet_ptq_cpu_aml_dataset.json`` and was causing the test to be flaky. Reduced the value for `resnet_ptq_cpu_aml_dataset.json` from `0.1` to `0.05` since the baseline is better now. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit b06a89a Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Sat Oct 14 01:32:12 2023 -0700 Train original resnet for min 1 epoch (#644) ## Describe your changes The default value for `num_epochs` in `prepare_model_data.py` is `0` so we are using a random model. The accuracy numbers and goals are meaningless for this. This also causes the resnet test to be flaky since the original accuracy ~10% and doen't have much freedom for movement. Train it for atleast 1 epoch so that we are testing for a good model. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 4a00cad Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Fri Oct 13 20:22:22 2023 -0700 Clean up `no_search` mode to not use `search_strategy` (#643) ## Describe your changes We currently use a hacky method to run `no_search` using `search_strategy`. This makes the logic confusing and unnecessarily complicated. Clean up the logic by just iterating over the pass flows directly. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 989964d Author: lainey1570 <91394589+lainey1570@users.noreply.github.com> Date: Fri Oct 13 19:34:35 2023 -0400 Implemented Performance Monitoring in Azure DevOps with Olive (#439) ## Describe your changes Contains Python scripts and Azure DevOps YAML files for performance monitoring of Olive models. The scripts and YAML files help you compare the performance of different models and ensure no regression occurs over time. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [x] Make sure all tests can pass. - [x] Update documents if necessary. - [x] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: Xiaoyu <xiaoyuzhang@microsoft.com> commit bb66c6c Author: trajep <jiapli@microsoft.com> Date: Thu Oct 12 15:27:35 2023 +0800 🎿 Support ORT 1.16.1 (#639) ## Describe your changes 1. Skip vitis tests for ORT 1.16.1. Wait for VitisAI teams to fix this then to add the tests back. 2. Remove quant pre process as the bug is fixed in 1.16.1. 3. lower the metrics goal to allow at least one model in output. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link

## Describe your changes `MatMul4BitsQuantizer` uses logging.basicConfig to set level to INFO and prints very verbose logs at INFO level. Suppress these logs by manually setting the loggers level to `ERROR`. Update ort version requirement to `>=1.16.2` since the quantizer will be added to 1.16.2. Remove the save to tmp_dir -> load -> save steps since it is not needed. We only need to sort the model topologically before saving the model to file. Refer to this discussion for more context #641 (comment). `llama.py`: Correct description for `--only_config` option. Create a new user_script file for the workflow instead of updating the original one. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link

trajepl changed the title ~~llamav2 converter case~~ 🦙 [llama2] converter + transformer optimizaition_with GQA Oct 12, 2023

github-advanced-security bot found potential problems Oct 12, 2023

View reviewed changes

examples/llama2/user_script.py Fixed Show fixed Hide fixed

examples/llama2/user_script.py Fixed Show fixed Hide fixed

trajepl force-pushed the jiapli/llama_converter branch from b7b76d0 to 2a8981c Compare October 19, 2023 06:15

github-advanced-security bot found potential problems Oct 20, 2023

View reviewed changes

trajepl force-pushed the jiapli/llama_converter branch from 10c43de to bd99b92 Compare October 23, 2023 02:56

github-advanced-security bot found potential problems Oct 23, 2023

View reviewed changes

examples/llama2/user_script.py Fixed Show fixed Hide fixed

examples/llama2/user_script.py Fixed Show fixed Hide fixed

examples/llama2/user_script.py Fixed Show fixed Hide fixed

llamav2 converter case

257b694

trajepl force-pushed the jiapli/llama_converter branch from 5579f67 to 257b694 Compare October 23, 2023 09:40

github-advanced-security bot found potential problems Oct 23, 2023

View reviewed changes

examples/llama2/user_script.py Fixed Show fixed Hide fixed

examples/llama2/user_script.py Fixed Show fixed Hide fixed

examples/llama2/user_script.py Fixed Show fixed Hide fixed

trajepl added 2 commits October 24, 2023 11:25

Merge branch 'main' of https://github.com/microsoft/olive into jiapli…

4e1980c

…/llama_converter

disable false alert on linter

8411f0d

trajepl marked this pull request as ready for review October 24, 2023 03:36

trajepl changed the title ~~🦙 [llama2] converter + transformer optimizaition_with GQA~~ 🦙 llama2 optimization Oct 24, 2023

guotuofeng reviewed Oct 24, 2023

View reviewed changes

examples/llama2/user_script.py Outdated Show resolved Hide resolved

guotuofeng reviewed Oct 24, 2023

View reviewed changes

olive/model/__init__.py Show resolved Hide resolved

guotuofeng reviewed Oct 24, 2023

View reviewed changes

examples/llama2/user_script.py Outdated Show resolved Hide resolved

separated evaluation function

1cbf2bd

guotuofeng reviewed Oct 24, 2023

View reviewed changes

examples/llama2/ort_converter_merged_llama2_cpu.json Outdated Show resolved Hide resolved

Add entry for optimization on cpu/gpu

2c18926

github-advanced-security bot found potential problems Oct 24, 2023

View reviewed changes

examples/llama2/llama2.py Fixed Show fixed Hide fixed

trajepl added 2 commits October 24, 2023 19:06

fix linter

cda8f15

fix readme title

7374050

jambayk reviewed Oct 26, 2023

View reviewed changes

olive/passes/onnx/quantization.py Show resolved Hide resolved

devang-ml reviewed Oct 26, 2023

View reviewed changes

examples/llama2/README.md Show resolved Hide resolved

examples/llama2/llama2_cpu.json Outdated Show resolved Hide resolved

trajepl added 3 commits October 27, 2023 11:05

Add todo

d178a16

merge template json file

5a706fa

fix readme

e66850e

jambayk reviewed Oct 27, 2023

View reviewed changes

jambayk approved these changes Oct 27, 2023

View reviewed changes

trajepl merged commit 649e314 into main Oct 28, 2023
31 checks passed

trajepl deleted the jiapli/llama_converter branch October 28, 2023 05:30

guotuofeng reviewed Oct 30, 2023

View reviewed changes

olive/passes/onnx/quantization.py Show resolved Hide resolved

guotuofeng reviewed Oct 30, 2023

View reviewed changes

olive/passes/onnx/quantization.py Show resolved Hide resolved

jambayk mentioned this pull request Nov 1, 2023

OnnxMatMul4Quantizer: Suppress quantizer logs, no tmp_dir #680

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🦙 llama2 optimization #641

🦙 llama2 optimization #641

trajepl commented Oct 12, 2023

jambayk Oct 27, 2023 •

edited

Loading

trajepl Oct 28, 2023

jambayk Oct 27, 2023 •

edited

Loading

trajepl Oct 28, 2023

🦙 llama2 optimization #641

🦙 llama2 optimization #641

Conversation

trajepl commented Oct 12, 2023

Describe your changes

Checklist before requesting a review

(Optional) Issue link

jambayk Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

trajepl Oct 28, 2023

Choose a reason for hiding this comment

jambayk Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

trajepl Oct 28, 2023

Choose a reason for hiding this comment

jambayk Oct 27, 2023 •

edited

Loading

jambayk Oct 27, 2023 •

edited

Loading