bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269

hiento09 · 2023-12-14T08:24:24Z

Describe the bug
My windows machine has 3 GPUs, when I enabled all 3 GPUs, the token speed was slow (6-9/s) and it even not able to load tinyllama 1B. When I disabled 2 GPUs, 1 active only, the performance was back to normal

Screenshots

3 GPUs active
- Low performance
- Load tinyllama error
1 GPU active only, then the performance was back to normal

Desktop (please complete the following information):

OS: Windows 11
Nvidia driver: 531.18
cuda version: 12.3
Nitro version: 0.1.27
GPU:
1 RTX 4070ti
2 RTX 1660ti

KossBoii · 2023-12-15T00:30:01Z

@hiento09 I have a feeling that this problem coming from the communication between different GPUs. I'll look out for this while reading the codebase right now.

hiro-v · 2023-12-17T08:16:46Z

@KossBoii that's the exact problem of multiple GPU problem.
I tested again on that machine:

If using only 4070ti => 55tok/sec
If using either 1 out of 2 2 1660ti => 28tok/sec

The distributed inference requires:

Good bandwidth between GPUs
The discrepancies between multiple GPUs should be not too much (e.g in this case 4070ti have to wait for 1660ti to compute). And also this case uses PCIe3 and 4, not NVlink => The data have to transmitted via CPU to get to another GPU.
Explicitly set the value for TP (tensor parallel) in nitro.

It depends but I think the option to use 1 model on a single GPU with the help of CUDA_VISIBLE_DEVICES makes sense in this case (i.e hardware sensing feature)

hiro-v · 2024-03-22T02:59:01Z

This should be properly supported with this instead: ggerganov/llama.cpp#6017

0xSage · 2024-07-01T05:05:26Z

closing in favor of tracking this more granularly, now that we have various engines

hiento09 added the type: bug Something isn't working label Dec 14, 2023

hiento09 assigned tikikun and hiro-v Dec 14, 2023

hiento09 changed the title ~~bug: nitro cuda windows not able to load tinyllama 1B~~ bug: nitro cuda windows low performance on machine has multiple GPUs Dec 14, 2023

hiento09 changed the title ~~bug: nitro cuda windows low performance on machine has multiple GPUs~~ bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App Dec 14, 2023

hiento09 mentioned this issue Dec 14, 2023

bug: token speed is low while using Jan Windows on GPU NVIDIA 4070ti janhq/jan#1002

Closed

linhtran174 self-assigned this Dec 15, 2023

hiro-v assigned hiento09 and unassigned linhtran174 Dec 18, 2023

hiento09 closed this as completed Jan 4, 2024

hiento09 reopened this Jan 4, 2024

Van-QA added this to Jan & Cortex May 28, 2024

0xSage closed this as completed Jul 1, 2024

github-project-automation bot moved this to Done in Jan & Cortex Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269

bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269

hiento09 commented Dec 14, 2023 •

edited

Loading

KossBoii commented Dec 15, 2023

hiro-v commented Dec 17, 2023

hiro-v commented Mar 22, 2024

0xSage commented Jul 1, 2024

bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269

bug: nitro cuda windows low performance on machine has multiple GPUs - tested using Jan App #269

Comments

hiento09 commented Dec 14, 2023 • edited Loading

KossBoii commented Dec 15, 2023

hiro-v commented Dec 17, 2023

hiro-v commented Mar 22, 2024

0xSage commented Jul 1, 2024

hiento09 commented Dec 14, 2023 •

edited

Loading