Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
MNN-LLM 正式合入
增加Transformer相关算子与对应图优化功能
新增 LLM 模块
使用方法
构建 LLM 文件包
编译LLM引擎并使用
编译MNN打开 MNN_BUILD_LLM 宏,编译 transformers/llm/engine 目录,产出 libllm 和 llm_demo
使用 llm_demo 运行 llm
CPU / GPU
性能测试
8gen1
| model | CPU 4线程 | | OpenCL | |
| --- | --- | --- | --- | --- |
| | prefill | decode | prefill | decode |
| qwen-1.8b | 207.56 | 35.58 | 28.58 | 20.40 |
| qwen-7b | 25.86 | 7.5 | 7.95 | 7.70 |
| llama3-8b | 22.09 | 5.59 | 内存不足 | 内存不足 |
8gen3
| model | CPU 4线程 | | OpenCL | |
| --- | --- | --- | --- | --- |
| | prefill | decode | prefill | decode |
| qwen-1.8b | 205.70 | 47.07 | 61.25 | 21.56 |
| qwen-7b | 40.93 | 11.01 | 20.26 | 10.60 |
| llama3-8b | 36.44 | 7.83 | 19.10 | 2.14 |
注:暂未对 llama3-8b 结构支持 Transformer 相关图优化,因此性能略差
形变缓存机制
背景
对于语音/文本类的模型,往往涉及张量某个维度的逐步变化,这种情况下每次都会进行几何计算、申请内存等操作,导致不可忽略的性能损耗。考虑到输入单个维度变化的情况下,网络中会存在部分算子形状不变,这部分计算是可以去除的。
为了对这种情况进行优化,MNN新增了形变缓存机制,使模型输入形状变化时,形状不变的算子跳过形变相关操作,以提升性能。
原理
基于 Module API 使用
对应 API
示例
基于 Intrepreter - Session API使用
相关API
示例
鸿蒙系统支持、功能完善与Github Issue 修正
功能完善
相应 Issue 修正