Skip to content

Releases: microsoft/onnxruntime-genai

v0.5.2

26 Nov 18:05
27bcf6c
Compare
Choose a tag to compare

Release Notes

Patch release 0.5.2 adds:

  • Fixes for bugs #1074, #1092 via PRs #1065 and #1070
  • Fix Nuget sample in package README to show correct disposal of objects
  • Added extra validation via PRs #1050 #1066

Features in 0.5.0:

  • Support for MultiLoRA
  • Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
  • Support for the Phi-3 MoE model
  • Support for NVIDIA Nemotron model
  • Support for the Qwen model
  • Addition of the Set Terminate feature, which allows users to cancel mid-generation
  • Soft capping support for Group Query Attention
  • Extend quantization support to embedding and LM head layers
  • Mac support in published packages

Known issues

  • Models running with DirectML do not support batching
  • Python 3.13 is not supported in this release

v0.5.1

13 Nov 21:26
e8cd6bc
Compare
Choose a tag to compare

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

  • Add ability to choose provider and modify options at runtime
  • Fixed data leakage bug with KV caches

Features in 0.5.0:

  • Support for MultiLoRA
  • Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
  • Support for the Phi-3 MoE model
  • Support for NVIDIA Nemotron model
  • Support for the Qwen model
  • Addition of the Set Terminate feature, which allows users to cancel mid-generation
  • Soft capping support for Group Query Attention
  • Extend quantization support to embedding and LM head layers
  • Mac support in published packages

Known issues

  • Models running with DirectML do not support batching
  • Python 3.13 is not supported in this release

v0.5.0

08 Nov 19:43
826f6aa
Compare
Choose a tag to compare

Release Notes

  • Support for MultiLoRA
  • Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
  • Support for the Phi-3 MoE model
  • Support for NVIDIA Nemotron model
  • Support for the Qwen model
  • Addition of the Set Terminate feature, which allows users to cancel mid-generation
  • Soft capping support for Group Query Attention
  • Extend quantization support to embedding and LM head layers
  • Mac support in published packages

Known issues

  • Models running with DirectML do not support batching
  • Python 3.13 is not supported in this release

v0.4.0

22 Aug 20:26
b77e768
Compare
Choose a tag to compare

Release Notes

  • Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
  • Support to build already-quantized models that were quantized with AWQ or GPTQ
  • Performance improvements for Intel and Arm CPU
  • Packing and language binding
    • Added Java bindings (build from source)
    • Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
    • Publish packages for Win Arm
    • Support for Android (build from source)

v0.3.0

21 Jun 21:23
964eb65
Compare
Choose a tag to compare

Release Notes

  • Phi-3 Vision model support for DML EP.
  • Addressed DML memory leak issue and crashes on long prompts.
  • Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
  • Added the import lib for windows C API package.
  • Addressed a bug with get_output('logits') so that it returns the logits for the entire prompt and not for the last generated token.
  • Addressed a bug with querying the device type of the model so that it won't crash.
  • Added NetStandard 2.0 compatibility.

ONNX Runtime GenAI v0.3.0-rc2

30 May 17:24
d536387
Compare
Choose a tag to compare
Pre-release

Release Notes

  • Added support for the Phi-3-Vision model.
  • Added support for the Phi-3-Small model.
  • Removed usage of std::filesystem to avoid runtime issues when loading incompatible symbols from stdc++ and stdc++fs.