-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269
Comments
@hiento09 I have a feeling that this problem coming from the communication between different GPUs. I'll look out for this while reading the codebase right now. |
@KossBoii that's the exact problem of multiple GPU problem.
The distributed inference requires:
It depends but I think the option to use 1 model on a single GPU with the help of |
This should be properly supported with this instead: ggerganov/llama.cpp#6017 |
closing in favor of tracking this more granularly, now that we have various engines |
Describe the bug
My windows machine has 3 GPUs, when I enabled all 3 GPUs, the token speed was slow (6-9/s) and it even not able to load tinyllama 1B. When I disabled 2 GPUs, 1 active only, the performance was back to normal
Screenshots
3 GPUs active
1 GPU active only, then the performance was back to normal
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: