-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] internlm2_5-7b-chat多卡部署报错 aborted #2508
Comments
使用多卡推理internlm2_5-7b-chat-4bit |
export TM_DEBUG_LEVEL=DEBUG |
@lvhan028 这是日志最后的部分,报错大概是 [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start |
Checklist
Describe the bug
4卡T4的服务器,使用lmdeploy部署internlm2_5-7b-chat,张量并行tp=2
模型成功加载到显存,api接口服务正常。
调用推理接口,模型报错aborted,进程结束
使用internlm/internlm2_5-7b-chat-4bit,可以在单卡正常部署
Reproduction
使用modelscope
export LMDEPLOY_USE_MODELSCOPE=True
用cli工具部署服务
lmdeploy serve api_server Shanghai_AI_Laboratory/internlm2_5-7b-chat --backend turbomind --chat-template internlm2 --tp 2
用其他机器post请求推理接口
ip:23333/v1/chat/completions
{ "model": "/root/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm2_5-7b-chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "讲一个三国故事"} ], "temperature": 0.7, "top_p": 0.8 }
进程报错跳出
(lmdeploy) [root@local-gpu models]# lmdeploy serve api_server Shanghai_AI_Laboratory/internlm2_5-7b-chat --backend turbomind --chat-template internlm2 --tp 2 [WARNING] gemm_config.in is not found; using default GEMM algo [WARNING] gemm_config.in is not found; using default GEMM algo HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!! INFO: Started server process [32752] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit) 已放弃
Environment
Error traceback
The text was updated successfully, but these errors were encountered: