Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4061
metal : reorder write loop in mul mat kernel + style (#10231) * metal : reorder write loop * metal : int -> short, style ggml-ci
b4060
metal : fix build and some more comments (#10229)
b4059
metal : fix F32 accumulation in FA vec kernel (#10232)
b4058
b4057
metal : hide debug messages from normal log
b4056
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…
b4055
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
b4053
metal : opt-in compile flag for BF16 (#10218) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci
b4052
metal : improve clarity (minor) (#10171)
b4050
swift : exclude ggml-metal-embed.metal (#10211) * llama.swift : exclude ggml-metal-embed.metal * swift : exclude build/