-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🦙 llama2 optimization #641
Conversation
b7b76d0
to
2a8981c
Compare
10c43de
to
bd99b92
Compare
5579f67
to
257b694
Compare
if config["use_gqa"]: | ||
# Replace MultiHeadAttention with GroupQueryAttention and remove attention mask nodes | ||
num_kv_heads = model.model_attributes.get("num_key_value_heads", None) | ||
if num_kv_heads is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker: Should we make it possible to provide the value as a pass config parameter too?
I think there is an asymmetry that num_head
and hidden_size
can be pass config param or model attributes.
Also, we don't usually provide model_attributes manually for non-automatic workflows. Most examples have the pass config with the model specific values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. But the GQA currently only works for llama-70B as the num_heads!=num_kv_heads.
We can complete the hf_mapping to introduce similar value for num_kv_heads.
@@ -202,3 +226,36 @@ def _run_for_config( | |||
|
|||
# save the model to the output path and return the model | |||
return model_proto_to_olive_model(optimizer.model, output_model_path, config) | |||
|
|||
@staticmethod | |||
def _replace_mha_with_gqa(model: "OnnxModel", past_seq_len: str = "past_sequence_length", kv_num_heads: int = 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't use onnxruntime.transformers.convert_generation.replace_mha_with_gqa
directly? Do we want this option to still be usable for older versions of onnxruntime on the host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, old version of ort do not contain the corresponding OPs, so that is not the problem.
The reason I was thinking: the scripts under convert generation is like temp-solutions, also as this part of logics is quite simple, I directly copy it here.
commit 093cfaf Merge: d074e17 649e314 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 30 18:20:05 2023 +0000 Merge branch 'main' into jambayk/lora commit d074e17 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 30 18:11:28 2023 +0000 merge main commit 649e314 Author: trajep <jiapli@microsoft.com> Date: Sat Oct 28 13:30:40 2023 +0800 🦙 llama2 optimization (#641) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 819c25a Author: Mike Guo <myguo@microsoft.com> Date: Thu Oct 26 13:43:31 2023 +0800 raise ValueError if the model_type is None (#666) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit b897b8c Author: Mike Guo <myguo@microsoft.com> Date: Thu Oct 26 11:41:40 2023 +0800 raise the known failure exceptions when do perf-tuning (#664) ## Describe your changes In perf-tuning, if the exception is known as ImportModuleError, we need not retry the next step. And we need throw the exception like in engine. ## Checklist before requesting a review - [x] Add unit tests for this change. - [x] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit a54dda8 Author: trajep <jiapli@microsoft.com> Date: Mon Oct 23 12:10:54 2023 +0800 🆓 Release GPU memory for torch model evaluation (#662) ## Describe your changes In llamav2 tests, torch model occupied the gpu memory which leads to error when evaluate onnx model. Release GPU memory for torch model evaluation model.to("cpu") could not release gpu memory in time. Call torch.cuda.empty_cache() to cleanup. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ac6d0f7 Author: Mike Guo <myguo@microsoft.com> Date: Mon Oct 23 09:04:29 2023 +0800 fix some pylint issues (#661) ## Describe your changes fix some pylint issues ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 01296bc Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Fri Oct 20 20:30:37 2023 -0700 Optional `evaluate_input_model` in no-search mode (#663) ## Describe your changes In `no-search` mode, the `evaluate_input_model=True` case is only used if an evaluator is provided. Otherwise, it requires the user to explicitly set `evaluate_input_model=False`. Now the input model evaluation behavior is the same as for the output model: evaluate if evaluator provided. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 91f1c45 Author: Mike Guo <myguo@microsoft.com> Date: Fri Oct 20 14:48:50 2023 +0800 remove pre-commit CI pipeline (#660) ## Describe your changes remove pre-commit CI pipeline ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ece52f7 Author: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Date: Thu Oct 19 23:01:01 2023 -0700 Update contributing doc (#642) ## Describe your changes Update contributing doc ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 4df44a5 Author: Mike Guo <myguo@microsoft.com> Date: Fri Oct 20 13:43:00 2023 +0800 add editorconfig rules & enable pylint (#659) ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [x] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 2b5aef1 Author: trajep <jiapli@microsoft.com> Date: Thu Oct 19 10:48:56 2023 +0800 🚧 Consistent dataloader for benchmark (#657) ## Describe your changes Consistent dataloader for benchmark Transforms.RandomCrop/Clip will generate different tensors every time call dataloader creation function. It bring the inconsistent model accuracy measurement. This PR is a workaround used to ensure the input data is consist for torch model and onnx model. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit ab70b6c Author: Gaurav Garg <52341457+gaugarg-nv@users.noreply.github.com> Date: Thu Oct 19 06:34:33 2023 +0530 Fix bugs in how free dimension session options are set: (#658) ## Describe your changes - The “unet_text_embeds_size” free dimension should always be overridden with a value of 1280 irrespective of image resolution - this value is same as the text_encoder_2 hidden dimension for both SD-XL base and refiner model. Currently using a resolution other than 1024 results in error. - With the optimum pipeline, the right keyword to pass session options is “session_options” and not “sess_options”. This change improves perf by 40% for 512x512 resolution and about 10% for 1024x1024 resolution. commit 42524f5 Author: trajep <jiapli@microsoft.com> Date: Wed Oct 18 18:34:36 2023 +0800 👁️🗨️ Disable qualcomm linkcheck (#655) ## Describe your changes Docs build failed with broken link from qualcomm. This pr is used to disable link check for qualcomm ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 133d5d5 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Tue Oct 17 23:05:54 2023 -0700 Text-gen: Optional attn mask. Loading args for hf model config. QLoRA: Update loading args (#654) ## Describe your changes Some enhancements to the QLoRA pass and related components. Example use case scenario is for fine-tuning [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5). In order to load the model and config for this model non-interactively, we have to pass `trust_remote_code=True` to `model_loading_args`. However, the QLoRA pass and `get_hf_model_config` both ignore it. There are also cases where we have to provide `token` for models like [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b). Both components have been updated to use the model_loading_args correctly. `microsoft/phi-1_5` gives a warning about `attention_mask` during training. Text-gen preprocessing is updated to make `attention_mask` optional (opt out if needed). Bug Fix: Check `pad_token_id` for `None` since `not pad_token_id` is `True` for `pad_token_id == 0`. Renamed `dataset_name` to `data_name` in qlora example to use consistent naming. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 169ffda Author: trajep <jiapli@microsoft.com> Date: Wed Oct 18 13:31:41 2023 +0800 🍮 Detect Data Component Reserved Arguments with Position info (#636) ## Describe your changes 1. Fix arguments for customized data components. First arguments for pre/post process and dataloader must starts with `_`. 2. Update data configs documents. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 0bc3d5d Author: Xiao Sheng <shengxiao0319@126.com> Date: Tue Oct 17 18:34:23 2023 +0800 Update Vitis AI quantization to support ORT 1.16, support TensorData and QuantizationParams (#650) ## Describe your changes Update Vitis AI quantization to support ORT 1.16, support TensorData and QuantizationParams ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link #629 commit efd83e0 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Mon Oct 16 02:10:52 2023 -0700 Fix `quant_preprocess` import in VitisAI Quantization pass (#648) ## Describe your changes Follow up to #639 that missed this pass. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit e7eae08 Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Sun Oct 15 22:16:08 2023 -0700 ResNet: Update seed_everything and accuracy goal (#646) ## Describe your changes Updated the seed steps to make the model consistent accross runs. Update the accuracy goal in `resnet_ptq_cpu_aml_dataset.json` to increase the tolerance. This was not increased last time along with `r`resnet_ptq_cpu_aml_dataset.json`` and was causing the test to be flaky. Reduced the value for `resnet_ptq_cpu_aml_dataset.json` from `0.1` to `0.05` since the baseline is better now. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit b06a89a Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Sat Oct 14 01:32:12 2023 -0700 Train original resnet for min 1 epoch (#644) ## Describe your changes The default value for `num_epochs` in `prepare_model_data.py` is `0` so we are using a random model. The accuracy numbers and goals are meaningless for this. This also causes the resnet test to be flaky since the original accuracy ~10% and doen't have much freedom for movement. Train it for atleast 1 epoch so that we are testing for a good model. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 4a00cad Author: Jambay Kinley <jambaykinley@microsoft.com> Date: Fri Oct 13 20:22:22 2023 -0700 Clean up `no_search` mode to not use `search_strategy` (#643) ## Describe your changes We currently use a hacky method to run `no_search` using `search_strategy`. This makes the logic confusing and unnecessarily complicated. Clean up the logic by just iterating over the pass flows directly. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link commit 989964d Author: lainey1570 <91394589+lainey1570@users.noreply.github.com> Date: Fri Oct 13 19:34:35 2023 -0400 Implemented Performance Monitoring in Azure DevOps with Olive (#439) ## Describe your changes Contains Python scripts and Azure DevOps YAML files for performance monitoring of Olive models. The scripts and YAML files help you compare the performance of different models and ensure no regression occurs over time. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [x] Make sure all tests can pass. - [x] Update documents if necessary. - [x] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: Xiaoyu <xiaoyuzhang@microsoft.com> commit bb66c6c Author: trajep <jiapli@microsoft.com> Date: Thu Oct 12 15:27:35 2023 +0800 🎿 Support ORT 1.16.1 (#639) ## Describe your changes 1. Skip vitis tests for ORT 1.16.1. Wait for VitisAI teams to fix this then to add the tests back. 2. Remove quant pre process as the bug is fixed in 1.16.1. 3. lower the metrics goal to allow at least one model in output. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link
## Describe your changes `MatMul4BitsQuantizer` uses logging.basicConfig to set level to INFO and prints very verbose logs at INFO level. Suppress these logs by manually setting the loggers level to `ERROR`. Update ort version requirement to `>=1.16.2` since the quantizer will be added to 1.16.2. Remove the save to tmp_dir -> load -> save steps since it is not needed. We only need to sort the model topologically before saving the model to file. Refer to this discussion for more context #641 (comment). `llama.py`: Correct description for `--only_config` option. Create a new user_script file for the workflow instead of updating the original one. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Format your code by running `pre-commit run --all-files` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link
Describe your changes
Checklist before requesting a review
pre-commit run --all-files
(Optional) Issue link