Skip to content

OpenVINO™ Model Server 2024.4

Latest
Compare
Choose a tag to compare
@rasapala rasapala released this 19 Sep 11:46
· 2 commits to releases/2024/4 since this release
f958bf8

The 2024.4 release brings official support for OpenAI API text generation. It is now recommended for production usage. It comes with a set of added features and improvements.

Changes and improvements

  • Significant performance improvements for multinomial sampling algorithm

  • finish_reason in the response correctly determines reaching the max_tokens (length) and completed the sequence (stop)

  • Added automatic cancelling of text generation for disconnected clients

  • Included prefix caching feature which speeds up text generation by caching the prompt evaluation

  • Option to compress the KV Cache to lower precision – it reduces the memory consumption with minimal impact on accuracy

  • Added support for stop sampling parameters. It can define a sequence which stops text generation.

  • Added support for logprobs sampling parameter. It returns the probabilities of generated tokens.

  • Included generic metrics related to execution of MediaPipe graph. Metric ovms_current_graphs can be used for autoscaling based on current load and the level of concurrency. Counters like ovms_requests_accepted and ovms_responses can track the activity of the server.

  • Included demo of text generation horizontal scalability

  • Configurable handling of non-UTF-8 responses from the model – detokenizer can now automatically change then to Unicode replacement character

  • Included support for Llama3.1 models

  • Text generation is supported both on CPU and GPU -check the demo

Breaking changes

No breaking changes.

Bug fixes

  • Security and stability improvements

  • Fixed handling of model templates without bos_token

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.4 - CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.4-gpu - CPU, GPU and NPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog