You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: add support for KV cache quantization options (abetlen#1307)
* add KV cache quantization options
abetlen#1220abetlen#1305
* Add ggml_type
* Use ggml_type instead of string for quantization
* Add server support
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
* fix: Changed local API doc references to hosted (abetlen#1317)
* chore: Bump version
* fix: last tokens passing to sample_repetition_penalties function (abetlen#1295)
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Update llama.cpp
* fix: segfault when logits_all=False. Closesabetlen#1319
* feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (abetlen#1247)
* Generate binary wheel index on release
* Add total release downloads badge
* Update download label
* Use official cibuildwheel action
* Add workflows to build CUDA and Metal wheels
* Update generate index workflow
* Update workflow name
* feat: Update llama.cpp
* chore: Bump version
* fix(ci): use correct script name
* docs: LLAMA_CUBLAS -> LLAMA_CUDA
* docs: Add docs explaining how to install pre-built wheels.
* docs: Rename cuBLAS section to CUDA
* fix(docs): incorrect tool_choice example (abetlen#1330)
* feat: Update llama.cpp
* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closesabetlen#1328abetlen#1314
* fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closesabetlen#1328Closesabetlen#1314
* feat: Update llama.cpp
* fix: Always embed metal library. Closesabetlen#1332
* feat: Update llama.cpp
* chore: Bump version
---------
Co-authored-by: Limour <93720049+Limour-dev@users.noreply.github.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: lawfordp2017 <lawfordp@gmail.com>
Co-authored-by: Yuri Mikhailov <bitsharp@gmail.com>
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Steps
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/metal
n_gpu_layers > 0
The text was updated successfully, but these errors were encountered: