Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 1.1 KB

RELEASE.md

File metadata and controls

36 lines (24 loc) · 1.1 KB

MindSpore Serving Release Notes

MindSpore Serving 2.1.0 Release Notes

Major Features and Improvements

OS

  • [STABLE] add support for Python 3.9
  • [STABLE] add support for MindSpore 2.2.10
  • [STABLE] add support for MindSpore Lite 2.2.10

LLMs

  • [STABLE] add support for LLaMA2
  • [STABLE] add support for InternLM

Inference Performance

  • [STABLE] adapt to model with dynamic sequence length to avoid redundant input tokens padding
  • [STABLE] adjust the batch size based on the length of request queue to avoid fixed batch padding
  • [STABLE] support model parallel on multiple NPUs to make efficient use of device memory and computing power

Throughput

  • [STABLE] continuous batching of incoming requests to make better NPU utilization

Others

  • [STABLE] support token streaming using Server-Sent Events (SSE) for progressive generation
  • [BETA] provide launch script for convenience start.py

Contributors

Thanks goes to these wonderful people:

zhoupengyuan, shuchi, guoshipeng, tiankai, zhengyi, pengkang

Contributions of any kind are welcome!