Skip to content

Releases: ggerganov/llama.cpp

b4061

09 Nov 12:51
6423c65
Compare
Choose a tag to compare
metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

b4060

09 Nov 12:51
39a334a
Compare
Choose a tag to compare
metal : fix build and some more comments (#10229)

b4059

09 Nov 12:51
bb38cdd
Compare
Choose a tag to compare
metal : fix F32 accumulation in FA vec kernel (#10232)

b4058

09 Nov 12:51
f018acb
Compare
Choose a tag to compare
llama : fix Qwen model type strings

b4057

09 Nov 12:50
46323fa
Compare
Choose a tag to compare
metal : hide debug messages from normal log

b4056

09 Nov 08:50
5b359bb
Compare
Choose a tag to compare
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…

b4055

09 Nov 08:35
e892134
Compare
Choose a tag to compare
ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

b4053

08 Nov 21:10
ec450d3
Compare
Choose a tag to compare
metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci

b4052

08 Nov 17:46
695ad75
Compare
Choose a tag to compare
metal : improve clarity (minor) (#10171)

b4050

08 Nov 11:29
d05b312
Compare
Choose a tag to compare
swift : exclude ggml-metal-embed.metal (#10211)

* llama.swift : exclude ggml-metal-embed.metal

* swift : exclude build/