-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Feature matrix
Romain D edited this page Mar 21, 2024
·
7 revisions
CPU (AVX2) | CPU (ARM NEON) | Metal | cuBLAS | rocBLAS | SYCL | CLBlast | Vulkan | Kompute | |
---|---|---|---|---|---|---|---|---|---|
K-quants | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ 🐢⁵ | ✅ 🐢⁵ | 🚫 |
I-quants | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ 🐢⁴ | ✅ | ✅ | Partial¹ | 🚫 | 🚫 | 🚫 |
Multi-GPU | N/A | N/A | N/A | ✅ | ❓ | 🚫 | ❓ | ✅ | ❓ |
K cache quants | ✅ | ❓ | ✅ | ✅ 🐢³ | Partial⁶ 🐢³ | ❓ | ✅ | 🚫 | 🚫 |
MoE architecture | ✅ | ❓ | ✅ | ✅ | ✅ | ❓ | Partial² | 🚫 | 🚫 |
- ✅: feature works
- 🚫: feature does not work
- ❓: unknown, please contribute if you can test it youself
- 🐢: feature is slow
- ¹: IQ3_S and IQ1_S, see #5886
- ²: Only with
-ngl 0
- ³: Inference is 50% slower
- ⁴: Slower than K-quants of comparable size
- ⁵: Slower than cuBLAS/rocBLAS on similar cards
- ⁶: Only q8_0 and iq4_nl
Useful information for users that doesn't fit into Readme.
These are information useful for Maintainers and Developers which does not fit into code comments
Click on a badge to jump to workflow. This is here as a useful general view of all the actions so that we may notice quicker if main branch automation is broken and where.