Only Distribute CUDA Binaries? #298

martindevans · 2023-11-15T23:39:22Z

As of this PR to llama.cpp the CUDA binaries are capable of running with CPU only, as long as n_gpu_layers = 0.

This might mean that we can significantly simplify our distribution of binaries by removing the CPU only variants and only shiping CUDA ones.

The text was updated successfully, but these errors were encountered:

martindevans · 2023-11-15T23:49:21Z

There's some further discussion over in the linked PR with some potential issues that may make this less attractive:

Even with ngl=0 some work is done on GPU (maybe)

AsakusaRinne · 2023-11-17T08:45:47Z

It's a good news to hear this feature supported by llama.cpp. However, I have some disagreements.

If the user does not have the cuda installed, using cuda backend makes little sense. Besides, the cpu backend is much smaller than the cuda backend, which may be desired in some conditions. I prefer to keep separating these two backends. :)

martindevans added this to LLamaSharp Dev Nov 15, 2023

martindevans moved this to 🔖 In Discussion in LLamaSharp Dev Nov 15, 2023

martindevans closed this as completed Nov 17, 2023

github-project-automation bot moved this from 🔖 In Discussion to ✅ Done in LLamaSharp Dev Nov 17, 2023

AsakusaRinne mentioned this issue Nov 18, 2023

Feature Request: Switch backends dynamically at runtime? #264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only Distribute CUDA Binaries? #298

Only Distribute CUDA Binaries? #298

martindevans commented Nov 15, 2023

martindevans commented Nov 15, 2023

AsakusaRinne commented Nov 17, 2023

Only Distribute CUDA Binaries? #298

Only Distribute CUDA Binaries? #298

Comments

martindevans commented Nov 15, 2023

martindevans commented Nov 15, 2023

AsakusaRinne commented Nov 17, 2023