Cuda build and --n-gpu-layers set to 0 #10200
-
Hello, I have a LLamaSharp related question. I commented on an issue in the LLamaSharp repo and was redirected here. I am using the LLamaSharp library version 1.18.0 which corresponds to the c35e586e commit id in llama.cpp.
I this considered normal when I noticed in the llama.cpp build documentation that
Does that mean that when llama.cpp is build with CUDA acceleration we can't disable GPU inference? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Even with 0 layers offloaded, some ops (like large matrix multiplications) can still be offloaded to the GPU. You can disable the GPU completely by unsetting the environment variable CUDA_VISIBLE_DEVICES= ./my-app ... |
Beta Was this translation helpful? Give feedback.
Even with 0 layers offloaded, some ops (like large matrix multiplications) can still be offloaded to the GPU. You can disable the GPU completely by unsetting the environment variable
CUDA_VISIBLE_DEVICES
. For example: