- 实现的是GOT-OCR mnn cpu版本,cuda版本目前还有错误。
修改model/config.json backend_type为 cpu或者cuda,来修改不同的后端 - 首先运行 export/llmexport.py 生成onnx
- 然后安装"pip install mnn", 使用命令 mnnconvert -f ONNX --modelFile ./model/onnx/llm.onnx --MNNModel ./model/llm.mnn --bizCode MNN --transformerFuse
- 生成对应mnn文件, 将文件放入到 model文件夹下。 注意这里名字需要是 vision.mnn 和 lmm.mnn (因为mnn原始代码src/llmconfig.hpp里面写死了)。
- 编译mnn-llm : ./script/build.sh
- 运行: ./build/cli-demo ./model/config.json. 3.jpg
- cli: 使用命令行编译,android编译参考android_build.sh
- web: 使用命令行编译,运行时需要指定web资源
- android: 使用Android Studio打开编译;
- ios: 使用Xcode打开编译;🚀🚀🚀该示例代码100%由ChatGPT生成🚀🚀🚀
- python: 对mnn-llm的python封装
mnnllm
; - other: 新增文本embedding,向量查询,文本解析,记忆库与知识库能力🔥;
llm模型导出 onnx
和 mnn
模型请使用llm-export
modelscope
模型下载:
qwen
- modelscope-qwen-1.8b-chat
- modelscope-qwen-7b-chat
- modelscope-qwen-vl-chat
- modelscope-qwen1.5-0.5b-chat
- modelscope-qwen1.5-1.8b-chat
- modelscope-qwen1.5-4b-chat
- modelscope-qwen1.5-7b-chat
- modelscope-qwen2-0.5b-instruct
- modelscope-qwen2-1.5b-instruct
- modelscope-qwen2-7b-instruct
- modelscope-qwen2-vl-2b-instruct
- modelscope-qwen2-vl-7b-instruct
- modelscope-qwen2.5-0.5b-instruct
- modelscope-qwen2.5-1.5b-instruct
- modelscope-qwen2.5-3b-instruct
- modelscope-qwen2.5-7b-instruct
- modelscope-qwen2.5-coder-1.5b-instruct
- modelscope-qwen2.5-coder-7b-instruct
- modelscope-qwen2.5-math-1.5b-instruct
- modelscope-qwen2.5-math-7b-instruct
- modelscope-reader-lm-0.5b
- modelscope-reader-lm-1.5b
glm
llama
phi
CI构建状态:
Build Status Build Status Build Status Build Status Build Status Build Status
# clone
git clone --recurse-submodules https://github.com/wangzhaode/mnn-llm.git
cd mnn-llm
# linux
./script/build.sh
# macos
./script/build.sh
# windows msvc
./script/build.ps1
# python wheel
./script/py_build.sh
# android
./script/android_build.sh
# android apk
./script/android_app_build.sh
# ios
./script/ios_build.sh
一些编译宏:
BUILD_FOR_ANDROID
: 编译到Android设备;LLM_SUPPORT_VISION
: 是否支持视觉处理能力;DUMP_PROFILE_INFO
: 每次对话后dump出性能数据到命令行中;
默认使用 CPU
,如果使用其他后端或能力,可以在编译MNN时添加 MNN
编译宏
- cuda:
-DMNN_CUDA=ON
- opencl:
-DMNN_OPENCL=ON
- metal:
-DMNN_METAL=ON
# linux/macos
./cli_demo ./Qwen2-1.5B-Instruct-MNN/config.json # cli demo
./web_demo ./Qwen2-1.5B-Instruct-MNN/config.json ../web # web ui demo
# windows
.\Debug\cli_demo.exe ./Qwen2-1.5B-Instruct-MNN/config.json
.\Debug\web_demo.exe ./Qwen2-1.5B-Instruct-MNN/config.json ../web
# android
adb push libs/*.so build/libllm.so build/cli_demo /data/local/tmp
adb push model_dir /data/local/tmp
adb shell "cd /data/local/tmp && export LD_LIBRARY_PATH=. && ./cli_demo ./Qwen2-1.5B-Instruct-MNN/config.json"
reference
- cpp-httplib
- chatgpt-web
- ChatViewDemo
- nlohmann/json
- Qwen-1.8B-Chat
- Qwen-7B-Chat
- Qwen-VL-Chat
- Qwen1.5-0.5B-Chat
- Qwen1.5-1.8B-Chat
- Qwen1.5-4B-Chat
- Qwen1.5-7B-Chat
- Qwen2-0.5B-Instruct
- Qwen2-1.5B-Instruct
- Qwen2-7B-Instruct
- Qwen2-VL-2B-Instruct
- Qwen2-VL-7B-Instruct
- Qwen2.5-0.5B-Instruct
- Qwen2.5-1.5B-Instruct
- Qwen2.5-3B-Instruct
- Qwen2.5-7B-Instruct
- Qwen2.5-Coder-1.5B-Instruct
- Qwen2.5-Coder-7B-Instruct
- Qwen2.5-Math-1.5B-Instruct
- Qwen2.5-Math-7B-Instruct
- chatglm-6b
- chatglm2-6b
- codegeex2-6b
- chatglm3-6b
- glm4-9b-chat
- Llama-2-7b-chat-ms
- Llama-3-8B-Instruct
- Llama-3.2-1B-Instruct
- Llama-3.2-3B-Instruct
- Baichuan2-7B-Chat
- internlm-chat-7b
- Yi-6B-Chat
- deepseek-llm-7b-chat
- TinyLlama-1.1B-Chat-v0.6
- phi-2
- bge-large-zh
- gte_sentence-embedding_multilingual-base