Skip to content

v0.5.1

Compare
Choose a tag to compare
@RyanUnderhill RyanUnderhill released this 13 Nov 21:26
· 42 commits to main since this release
e8cd6bc

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

  • Add ability to choose provider and modify options at runtime
  • Fixed data leakage bug with KV caches

Features in 0.5.0:

  • Support for MultiLoRA
  • Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
  • Support for the Phi-3 MoE model
  • Support for NVIDIA Nemotron model
  • Support for the Qwen model
  • Addition of the Set Terminate feature, which allows users to cancel mid-generation
  • Soft capping support for Group Query Attention
  • Extend quantization support to embedding and LM head layers
  • Mac support in published packages

Known issues

  • Models running with DirectML do not support batching
  • Python 3.13 is not supported in this release