Releases · ochafik/llama.cpp

14 Dec 15:47

ba1cb19

b4327 Latest

Latest

llama : add Qwen2VL support + multimodal RoPE (#10361)

* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-14T15:47:50Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-14T15:47:59Z
llama-b4327-bin-macos-arm64.zip

55.3 MB 2024-12-14T15:48:11Z
llama-b4327-bin-macos-x64.zip

56.9 MB 2024-12-14T15:48:13Z
llama-b4327-bin-ubuntu-x64.zip

63.6 MB 2024-12-14T15:48:15Z
llama-b4327-bin-win-avx-x64.zip

8.41 MB 2024-12-14T15:48:18Z
llama-b4327-bin-win-avx2-x64.zip

8.41 MB 2024-12-14T15:48:19Z
llama-b4327-bin-win-avx512-x64.zip

8.42 MB 2024-12-14T15:48:19Z
llama-b4327-bin-win-cuda-cu11.7-x64.zip

145 MB 2024-12-14T15:48:20Z
llama-b4327-bin-win-cuda-cu12.4-x64.zip

145 MB 2024-12-14T15:48:25Z
Source code (zip)

2024-12-14T12:43:46Z
Source code (tar.gz)

2024-12-14T12:43:46Z

10 Dec 03:04

github-actions

b4295

26a8406

b4295

CUDA: fix shared memory access condition for mmv (#10740)

Assets 22

09 Dec 00:50

github-actions

b4291

ce8784b

b4291

server : fix format_infill (#10724)

* server : fix format_infill

* fix

* rename

* update test

* use another model

* update test

* update test

* test_invalid_input_extra_req

Assets 22

06 Dec 00:41

github-actions

b4274

7736837

b4274

fix(server) : not show alert when DONE is received (#10674)

Assets 22

24 Nov 02:00

github-actions

b4155

96fa2c5

b4155

fix gguf-py:  Conversion error when multiple licenses are configured …

Assets 22

30 Oct 22:30

github-actions

b3995

61408e7

b3995

kompute: add backend registry / device interfaces (#10045)

Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <slp@redhat.com>

Assets 22

28 Oct 22:31

github-actions

b3987

61715d5

b3987

llama : Add IBM granite template (#10013)

* Add granite template to llama.cpp

* Add granite template to test-chat-template.cpp

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Added proper template and expected output

* Small change to \n

Small change to \n

* Add code space &

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* Fix spacing

* Apply suggestions from code review

* Update src/llama.cpp

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

Assets 22

23 Oct 11:22

github-actions

b3963

873279b

b3963

flake.lock: Update

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/5633bcff0c6162b9e4b5f1264264611e950c8ec7?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09)
  → 'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)

Assets 22

22 Oct 11:44

github-actions

b3958

4ff7fe1

b3958

llama : add chat template for RWKV-World + fix EOT (#9968)

* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ochafik/llama.cpp

b4327

b4295

b4291

b4274

b4155

b3995

b3987

b3963

b3958