-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Pinging @bmtwl for any insights |
Beta Was this translation helpful? Give feedback.
-
You're going to want to use numactl to give you fine-grained control over where the threads execute: |
Beta Was this translation helpful? Give feedback.
-
When we added the threadpool and the new @Allan-Luu based on your description above you don't even need to specify the
This will strictly bind the threads to CPU cores 0-7, one thread per core. |
Beta Was this translation helpful? Give feedback.
You're going to want to use numactl to give you fine-grained control over where the threads execute:
numactl -N0-7 -m0-7 /path/to/llama-cli -m /path/to/model.gguf -t 8 --numa numactl
You can check the layout of your system with
numactl -H
so that you're getting the behaviour you want when assigning numa nodes