Releases: microsoft/onnxruntime-genai
Releases · microsoft/onnxruntime-genai
v0.5.2
Release Notes
Patch release 0.5.2 adds:
- Fixes for bugs #1074, #1092 via PRs #1065 and #1070
- Fix Nuget sample in package README to show correct disposal of objects
- Added extra validation via PRs #1050 #1066
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
v0.5.1
Release Notes
In addition to the features in the 0.5.0 release, this release adds:
- Add ability to choose provider and modify options at runtime
- Fixed data leakage bug with KV caches
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
v0.5.0
Release Notes
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release
v0.4.0
Release Notes
- Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
- Support to build already-quantized models that were quantized with AWQ or GPTQ
- Performance improvements for Intel and Arm CPU
- Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)
v0.3.0
Release Notes
- Phi-3 Vision model support for DML EP.
- Addressed DML memory leak issue and crashes on long prompts.
- Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
- Added the import lib for windows C API package.
- Addressed a bug with
get_output('logits')
so that it returns the logits for the entire prompt and not for the last generated token. - Addressed a bug with querying the device type of the model so that it won't crash.
- Added NetStandard 2.0 compatibility.
ONNX Runtime GenAI v0.3.0-rc2
Release Notes
- Added support for the Phi-3-Vision model.
- Added support for the Phi-3-Small model.
- Removed usage of
std::filesystem
to avoid runtime issues when loading incompatible symbols from stdc++ and stdc++fs.