Releases · ggerganov/llama.cpp

09 Nov 12:51

6423c65

b4061

metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

Assets 22

09 Nov 12:51

github-actions

b4060

39a334a

b4060

metal : fix build and some more comments (#10229)

Assets 22

09 Nov 12:51

github-actions

b4059

bb38cdd

b4059

metal : fix F32 accumulation in FA vec kernel (#10232)

Assets 22

09 Nov 12:51

github-actions

b4058

f018acb

b4058 Latest

Latest

llama : fix Qwen model type strings

Assets 22

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-11-09T12:51:47Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-11-09T12:51:56Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-09T12:52:10Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-11-09T12:52:17Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-11-09T12:52:25Z
llama-b4058-bin-macos-arm64.zip

52.4 MB 2024-11-09T12:52:32Z
llama-b4058-bin-macos-x64.zip

53.7 MB 2024-11-09T12:52:35Z
llama-b4058-bin-ubuntu-x64.zip

56.9 MB 2024-11-09T12:52:37Z
llama-b4058-bin-win-avx-x64.zip

8.12 MB 2024-11-09T12:52:39Z
llama-b4058-bin-win-avx2-x64.zip

8.12 MB 2024-11-09T12:52:40Z
Source code (zip)

2024-11-09T09:26:34Z
Source code (tar.gz)

2024-11-09T09:26:34Z

09 Nov 12:50

github-actions

b4057

46323fa

b4057

metal : hide debug messages from normal log

Assets 22

09 Nov 08:50

github-actions

b4056

5b359bb

b4056

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…

Assets 22

09 Nov 08:35

github-actions

b4055

e892134

b4055

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

Assets 22

08 Nov 21:10

github-actions

b4053

ec450d3

b4053

metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci

Assets 22

08 Nov 17:46

github-actions

b4052

695ad75

b4052

metal : improve clarity (minor) (#10171)

Assets 22

08 Nov 11:29

github-actions

b4050

d05b312

b4050

swift : exclude ggml-metal-embed.metal (#10211)

* llama.swift : exclude ggml-metal-embed.metal

* swift : exclude build/

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4061

b4060

b4059

b4058

b4057

b4056

b4055

b4053

b4052

b4050