Releases: microsoft/Olive
Releases · microsoft/Olive
Olive-ai 0.6.2
Workflow config
- Support YAML files as workflow config file. #1191
- Workflow id feature is a prerequisite for running workflow on a remote vm feature. By adding this feature #1179 :
- Cache dir will become
<cache_dir>/<workflow_id>
- OLive config will be automatically saved to cache dir.
- User can specify
workflow_id
in config file. - The default workflow_id is
default_workflow
.
- Cache dir will become
Passes (optimization techniques)
- Accept SNPE DLC model for qnn context binnary generator #1188
Data
- Remove params_config, components/component_args. All components specific parameters are now grouped in four separate objects: #1187
- load_dataset_config
- pre_process_data_config
- post_process_data_config
- dataloader_config
Docs
- Add olive workflow schema to doc website. This schema file can be used in IDEs when writing workflow configs. #1190
Olive-ai 0.6.1
Olive-ai 0.6.0
Examples
The following examples are added:
- Add LLM sample for DirectML #1082 #1106
- This adds an LLM sample for DirectML that can convert and quantize a bunch of LLMs from HuggingFace. The Dolly, Phi and LLaMA 2 folders were removed and replaced with a more generic LLM example that supports a large number of LLMs, including but not limited to Phi-2, Mistral, LLaMA 2
- Add Gemma to DML LLM sample #1138
- Llama2 optimization with multi-ep managed env #1087
- Llama2: Multi-lora example notebook, Custom generator #1114
- Search Optimal optimization among multiple EPs #1092
Olive CLI updates
- Previous commands
python -m olive.workflows.run
andpython -m olive.platform_sdk.qualcomm.configure
are deprecated. Useolive run
orpython -m olive
instead. #1129
Passes (optimization techniques)
- Pytorch
- ONNXRuntime
ExtractAdapters
pass supports int4 quantized models and expose the external data config options to users. #1083ModelBuilder
: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154OnnxFloatToFloat16
: Use ort float16 converter #1132NVModelOptQuantization
Quantize ONNX model with Nvidia-ModelOpt. #1135OnnxIOFloat16ToFloat32
: Converts float16 model inputs/outputs to float32. #1149- [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140
Data Config
- Remove name ambiguity in dataset configuration #1111
- Remove HfConfig::dataset references in examples and tests #1113
Engine
- Add aml deployment packaging. #1090
System
- Make the accelerator EP optional in olive systems for non-onnx pass. #1072
Data
- Add AML resource support for data configs.
- Add audio classification data preprocess function.
Model
- Provide build-in kv_cache_config for generative model's io_config #1121
- MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150
Metrics
Dependencies:
Support onnxruntime 1.17.3
Issues
Olive-ai 0.5.2
Examples
The following examples are added
Passes (optimization techniques)
- SliceGPT: SliceGPT is post-training sparsification scheme that makes transformer networks smaller by applying orthogonal transformations to each transformer layer that reduces the model size by slicing off the least-significant rows and columns of the weight matrices. This results in speedups and a reduced memory footprint.
- ExtractAdapters: Extracts the lora adapters (float or static quantized) weights and saves them in a separate file.
Engine
- Simplify the engine config
Fix
- GenAIModelExporter: In windows, the cache_dir of genai model exporter will exceed 260.
Olive-ai 0.5.1
Examples
The following examples are added
Passes (optimization techniques)
- QNNPreprocess: Add the configs which are added in onnxruntime nightly package.
- GptqQuantizer: PTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
- OnnxMatMul4Quantizer: Add matmul RTN/HQQ/GPTQ quant configs.
- Move all pass need create inference session to run on target:
- IncQuantization
- OptimumMerging
- OrtTransformersOptimization
- VitisAIQuantization
- OrtPerfTuning
Engine
- Support to pack AzureML output.
- Remove execution_providers from engine config, typical config looks like:
"systems": {
"local_system": {
"type": "LocalSystem",
"config": {
"accelerators": [
{
"device": "gpu",
"execution_providers": [
"CUDAExecutionProvider"
]
}
]
}
}
},
"engine": {
"host": "local_system",
"target": "local_system",
}
Workflows
- Delayed python pass module loading and provide the option
--package-config
to let advanced users to write their individual pass module and corresponding dependencies.
Fix
- Cannot load MLFlow model as
from_pretrained_args
is missed. - LoRA: Provide save_embedding_layers=False to saving the peft model. Otherwise, it defaults to "auto" which checks if the vocab size changed.
- Update the model_rank file for zipfile packaging type. The model path now is the path relative to the output zip file.
- Fix windows shutil.which return None when passing full python path.
Olive-ai 0.5.0
Examples
The following examples are added:
- Audio Spectrogram Transformer optimization #762
- Bert SNPE #925
- Llama2 GenAI #940
- Llama2 notebook turorial #798
- MobileNet optimization with QDQ Quantization on Qualcomm NPU #874
- Phi2 Generation #979
- Phi2 optimization with different precision #938
- Stable Diffusion OpenVINO example #853
Passes (optimization techniques)
New Passes
- PyTorch
- Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
- Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
- ONNXRuntime
- Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
- Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
- [QNN-EP] Add the option of
prepare_qnn_config:bool
for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation. - [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
- QNN
- Introduce QNNConversion pass to convert models to QNN C++ model.
- Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
- Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.
Updates
- OnnxConversion
- Support both
past_key_values.index.key/value
andpast_key_value.index
.
- Support both
- OptimumConversion
- Provide parameter
components
if the user wants to export only some models such asdecoder_model
anddecoder_with_past_model
. - Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means
legacy=False
andno_post_process=False
. User must provide them usingextra_args
if legacy behavior is desired.
- Provide parameter
- OpenVINO
- Upgrade OpenVINO API to 2023.2.0.
- OrtPerTuning
- Add
tunable_op_enable
andtunable_op_tuning_enable
for ROCM ep to speed up the performance.
- Add
- LoRA/QLoRA
- Support bfloat16 with ort-training.
- Support resuming training from checkpoint by
resume_from_checkpoint
option.overwrite_output_dir
option.
- MoEExpertsDistributor
- Add option to configure number of parallel jobs.
Engine
- As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
- Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.
System
- Add
hf_token
support for Olive systems. - AzureMLSystem
- Olive config file will be uploaded to AML jobs under codes folder.
- Support adding tags to the AML jobs.
- Support using existing AML workspace Environment for AzureMLSystem.
- DockerSystem
- Support running Olive Pass.
PythonEnvironmentSystem
requires Olive to be installed in the environment. It can run passes and evaluate models.- New
IsolatedORTSystem
introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.
Data
- Add AML resource support for data configs.
- Add audio classification data preprocess function.
Model
- Rename
model_loading_args
tofrom_pretrained_args
inhf_config
.
Metrics
- Add
throughput
metric support.
Dependencies:
Support onnxruntime 1.17.1.
Olive-ai 0.4.0
Examples
The following examples are added
- Llama2 optimization with ONNX Runtime Tools #641
- Llama2 finetuning with QLoRA and optimization with ONNX Runtime Tools #703
- Llama2 shard to multiple GPUs #694
- DirectML Llama2 #701
- DirectML phi #693
- phi-1.5 finetuning with QLoRA #689
Passes (optimization techniques)
- OrtPerTuning
- Raises known failure exceptions to immediately stop tuning.
- Default values for
device
andproviders_list
is based on the accelerator spec.
- OrtTransformersOptimization
- Checks that
model_type
is provided in the pass configs or available in the model attributes.None
is invalid. fp16
related arguments are better documented.
- Checks that
- Introduce LoRA pass for finetuning pytorch models with Low-Rank Adaptation
- Introduce OnnxMatMul4Quantizer pass to quantize onnx models to 4-bit integers.
- Introduce OnnxBnb4Quantization pass to quantize onnx models to 4-bit data types from bitsandbytes (FP4, NF4).
- Onnx external data configuration supports
size_threshold
andconvert_attribute
parameters. - LlamaPyTorchTensorParallel pass to split Llama model into a tensor parallel distributed pytorch model.
- OnnxConversion
- Support DistributedPyTorchModel.
use_device
andtorch_dtype
options to specify device ("cpu", "cuda") and data type ("float16", "float32") for the model before conversion.
- DeviceSpecificOnnxConversion removed in favor or OnnxConversion pass with
use_device
option. - LoRA/QLoRA
- Support training using ONNX Runtime Training.
- Mixed-precision training when
torch_dtype=float16
for numerical stability.
Engine
- Make
engine/evaluator
config optional in olive run config. With this default way, user can just run optimization without search and evaluation in simplest pass config. evaluate_input_model
is optional in engine config in no-search model. It is forced toFalse
when no evaluator is provided.ort_py_log_severity_level
option to control logging level for onnxruntime python logs.- CLI option
--tempdir
to use a custom directory as the root directory for tempfile. - IO-Binding:
- New method to efficiently bind inputs and outputs to the session using either the CPU or GPU depending on the device.
shared_kv_buffer
option to enable key value buffer sharing between input (past key values) and output (present key values)
Model
- DistributedOnnxModel file structure updated to use resource paths. Can be saved from cache to destination directory.
- Introduce DistributedPyTorchModel that is analogous to DistributedOnnxModel for pytorch model.
trust_remote_code
added to HFConfig model_loading_args.
Metrics
- Option to provide kwargs to user_script functions through
func_kwargs
Dependencies:
- Support onnxruntime 1.16.2
Olive-ai 0.3.3
Quick fix for v0.3.2
- Vitis AI quantization support ORT 1.16.1
- Add optional attention mask for text-generation task
Olive-ai 0.3.2
Examples
The following examples are added
- DirectML SDXL refiner #487
- Open Llama arc #582
- Enable Intel® Neural Compressor 4-bits weight-only quantization #614
- Add NCHW GroupNorm fusion to DirectML's SD examples #617
Passes (optimization techniques)
- QLoRA pass for torch model fine-tuning
- Intel® Neural Compressor 4-bits weight-only quantization
- OnnxModelOptimizer
- inserts a
Cast
operation for cases whereArgMax
input isn't supported on the device - Fuse consecutive Reshape operations when the latter results in flattening
- inserts a
Engine
- Summarize pass run history in table(install tabulate for better preview)
- Support to tune and evaluate models across different execution providers which are managed by Olive-ai.
Model
- Add model_loading_args, load_model and load_model_config to HFConfig.
- Add adapter_path to PyTorchModel
- Introduce model_attributes which can be used to simplify user's input for transformer_optimization
- Add AML curated model support
Dataset
- Auto-insertion of the input model (if it's a pytorch model with hf_config.dataset) data config in pass configs is removed. Use “input_model_data_config” if user want to use the input model's data config.
- Support a second type of dataset for
text-generation
tasks calledpair
- Support convert
olive dataset
to huggingfacedatasets.Dataset
Known Issues
- #571 Whisper gpu does not consume gpu resources
- #573 Distinguish pass instance with name not cls name
Dependencies:
- Support onnxruntime 1.16.1
- Drop python 3.7. Now you should ensure python >=3.8 to run Olive-ai optimization.
Olive-ai 0.3.1
Examples
The following examples are added
- Red Pajama Optimization with Optimum
- Stable Diffusion XL Optimization with DirectML
- GPT-J Optimization Using Intel® Neural Compressor
- BERT example using Intel Neural Compressor SmoothQuant
- Whisper example using Intel Neural Compressor
- Open LLaMA workflow example
Passes (optimization techniques)
- Introduce TorchTRTConversion
- Introduce SparseGPT pass for one-shot model pruning on large GPT like models using the algorithm proposed in https://arxiv.org/abs/2301.00774.
Systems
- Add AzureML sku support for AMLSystem
Evaluator
- Add metric_func config to custom metric. Olive will run the inference for custom eval func for user. User doesn't need to do inference by themselves.
- Add RawDataContainer:
SNPE evaluation and quantization now accept generic dataloaders such as torch dataloader
Metrics
- Add Perplexity metric for text-generation task
Engine
- Provide the interface to let user set the multi pass flows to run in save olive workflow