Olive-ai 0.6.0
Examples
The following examples are added:
- Add LLM sample for DirectML #1082 #1106
- This adds an LLM sample for DirectML that can convert and quantize a bunch of LLMs from HuggingFace. The Dolly, Phi and LLaMA 2 folders were removed and replaced with a more generic LLM example that supports a large number of LLMs, including but not limited to Phi-2, Mistral, LLaMA 2
- Add Gemma to DML LLM sample #1138
- Llama2 optimization with multi-ep managed env #1087
- Llama2: Multi-lora example notebook, Custom generator #1114
- Search Optimal optimization among multiple EPs #1092
Olive CLI updates
- Previous commands
python -m olive.workflows.run
andpython -m olive.platform_sdk.qualcomm.configure
are deprecated. Useolive run
orpython -m olive
instead. #1129
Passes (optimization techniques)
- Pytorch
- ONNXRuntime
ExtractAdapters
pass supports int4 quantized models and expose the external data config options to users. #1083ModelBuilder
: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154OnnxFloatToFloat16
: Use ort float16 converter #1132NVModelOptQuantization
Quantize ONNX model with Nvidia-ModelOpt. #1135OnnxIOFloat16ToFloat32
: Converts float16 model inputs/outputs to float32. #1149- [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140
Data Config
- Remove name ambiguity in dataset configuration #1111
- Remove HfConfig::dataset references in examples and tests #1113
Engine
- Add aml deployment packaging. #1090
System
- Make the accelerator EP optional in olive systems for non-onnx pass. #1072
Data
- Add AML resource support for data configs.
- Add audio classification data preprocess function.
Model
- Provide build-in kv_cache_config for generative model's io_config #1121
- MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150
Metrics
Dependencies:
Support onnxruntime 1.17.3