Llama cpp low level python bindings #1660

dmahurin · 2023-06-01T06:18:02Z

Background/rationale:

This pull request addresses #82 and #1156, bringing the low level python ctypes binding into llama.cpp. This should hopefully help reduce python binding fragmentation, and help broaden llama.cpp development. The use of python for examples and main wrappers is a pattern used in other related projects, such as rwkv.cpp and bert.cpp.

The ctypes python binding commits are from @abetlen / llama-python-cpp. Only the commits relevant for the low level bindings are included. Other commits such as hight level module or the server module are excluded. The remaining commits have been cleaned up some for clarity.

The python bindings can allow equivalent functionality of the bash scripts and main.cpp. Though the primary purpose is to get better alignment and widen the development community as python is a very common language in this field.

Having supported low level python bindings should not put any significant burden on c++ developers. As the python bindings become widely used, there will be many interested in keeping them up to date.

Use:

cmake -D BUILD_SHARED_LIBS=ON .

Chat.py is roughly equivalent to chat-13B.sh

MODEL=./models/llama-7B/ggml-model.bin python3 examples/Chat.py

low_level_api_chat_cpp.py is similar in functionality to main.cpp.

python3 examples/low_level_api_chat_cpp.py --model ./models/llama-7B/ggml-model.bin -b 1024 -i -r "User:" -f prompts/chat-with-bob.txt

low_level_api_chat_llama.py is simplified chat example.

…fixes

…hers.

Has some too many newline issues so WIP (Update) Fixed too many newlines, now onto args. Still needs shipping work so you could do "python -m llama_cpp.examples." etc.

… ignore eos, add no-mmap, fixed 1 character echo too much bug

…d_bos. Closes ggerganov#92

… class default value

dmahurin · 2023-06-01T13:22:01Z

tabs replaced and trailing spaces removed in all commits (forced push) to pass the editor check

JohannesGaessler · 2023-06-01T14:40:13Z

Having supported low level python bindings should not put any significant burden on c++ developers. As the python bindings become widely used, there will be many interested in keeping them up to date.

Conversely that will also mean that a lot of people will be angry if you do something that breaks the Python bindings though.

ggerganov · 2023-06-10T07:05:50Z

Not sure about this - I see the positives, but I'm worried that it will be too difficult for me to maintain Python code
Maybe at some later stage we can provide this API, but at the moment it will be a big burden. Open to suggestions though

Also, I get the impression that the llama-cpp-python project is in a pretty good shape and well maintained. I guess people can use that? Is there anything we can do to support it from llama.cpp side?

shakfu · 2024-11-05T12:17:13Z

@ggerganov

Not sure about this - I see the positives, but I'm worried that it will be too difficult for me to maintain Python code
Maybe at some later stage we can provide this API, but at the moment it will be a big burden. Open to suggestions though

I agree that there will be double the work to maintain both cpp and python bindings, unless the latter can be automated (but that is quite difficult in practice even with something like binder). It is better to specify a slower-moving (higher-level?) api (perhaps as a result of the llamax effort) and then different python wrappers can implement it. There certainly seems to be a good number of them.

I myself was working on one effort: llamalib which consisted of developing three thin compiled python3 llama.cpp wrappers of simultaneously (using pybind11, nanobind, and cython).. with the initial intent of providing an alternative compiled backend to llama-cpp-python instead of ctypes.

While I made some decent progress, it was non-too-productive updating three wrappers at once against the frenetic pace of this project so I spunoff the cython wrapper: cyllama which I am currently developing and trying to keep in sync with bleeding-edge llama.cpp changes.

In any case, well done to @abetlen for the stability / feature coverage provided by python-llama-cpp.

abetlen and others added 30 commits May 31, 2023 15:16

Initial commit (llama_cpp.py, llama-cpp-python)

d9dfdec

Update llama.cpp and re-organize low-level api

ef5a9a6

Bugfix: wrong signature for quantize function

bd1c657

Bugfix: cross-platform method to find shared lib

a3da39a

Fix array type signatures

019650f

Fix ctypes typing issue for Arrays

a7a6d88

Fix type signature of token_to_str

5bb1bc7

Add example based on stripped down version of main.cpp from llama.cpp

def46dd

Update llama.cpp (llama_progress_callback)

ef3c152

Update llama.cpp (llama_n_embd)

a279acd

Update llama.cpp

a71cda6

Update low level api example

62ce167

Update llama_cpp.py

2b8147e

Chat llama.cpp example implementation

15bea09

Add instruction mode

9e87241

Added instruction mode, fixed infinite generation, and various other …

0bfad75

…fixes

Fix stripping instruction prompt

3c1020b

Fix repeating instructions and an antiprompt bug

ae1f37f

Fix bug in init_break not being set when exited via antiprompt and ot…

739e8d4

…hers.

Add quantize example

ce66405

Better llama.cpp interoperability

29e9fb6

Has some too many newline issues so WIP (Update) Fixed too many newlines, now onto args. Still needs shipping work so you could do "python -m llama_cpp.examples." etc.

Bugfix: Wrong size of embeddings. Closes ggerganov#47

d568014

More interoperability to the original llama.cpp, and arguments now work

e199092

Update model paths to be more clear they should point to file

f25a813

Added iterative search to prevent instructions from being echoed, add…

b36c04c

… ignore eos, add no-mmap, fixed 1 character echo too much bug

Allow local llama library usage

d1b3517

Use environment variable for library override

c8b5d0b

Better custom library debugging

848b402

Make windows users happy (hopefully)

d0a7ce9

Update llama.cpp (llama_mmap_supported)

ce0ca60

SagsMug and others added 23 commits May 31, 2023 15:56

Fix mirastat requiring c_float

0bf36a7

Fix session loading and saving in low level example chat

fb79c56

low_level_api_chat_cpp.py: Fix missing antiprompt output in chat.

b5531e1

Allow model to tokenize strings longer than context length and set ad…

a439fe1

…d_bos. Closes ggerganov#92

Add types for all low-level api functions

731c712

Add return type annotations for embeddings and logits

f20b34a

Fix llama_cpp types

7862b52

Fix candidates type

ff31330

Fix: types

0c2fb05

Fix: runtime type errors

4885e55

Fix return type

6905884

Fix obscure Wndows DLL issue. Closes ggerganov#208

3808a73

Fix mlock_supported and mmap_supported return type

59f80d2

Update llama.cpp (remove min_keep default value)

7609c73

Add winmode arg only on windows if python version supports it

a83d117

Update llama.cpp

aae6c03

Fixd CUBLAS dll load issue in Windows

66c27f3

Check for CUDA_PATH before adding

601b192

Fix llama_cpp and Llama type signatures. Closes ggerganov#221

fda33dd

Update llama.cpp

60a7c76

fix "missing 1 required positional argument: 'min_keep'"

4ad62c4

Look for libllama in parent directory

e5dad2a

low_level_api_chat_cpp.py: fix default path_prefix arg value to match…

93278f8

… class default value

dmahurin force-pushed the llama-cpp-python-low-level branch from c8186ab to 93278f8 Compare June 1, 2023 13:10

dmahurin mentioned this pull request Jun 1, 2023

python bindings? #82

Closed

jpodivin mentioned this pull request Jun 3, 2023

Replacing call to convert-pth-to-ggml.py with convert.py #1641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama cpp low level python bindings #1660

Llama cpp low level python bindings #1660

dmahurin commented Jun 1, 2023

dmahurin commented Jun 1, 2023

JohannesGaessler commented Jun 1, 2023

ggerganov commented Jun 10, 2023

shakfu commented Nov 5, 2024 •

edited

Loading

Llama cpp low level python bindings #1660

Are you sure you want to change the base?

Llama cpp low level python bindings #1660

Conversation

dmahurin commented Jun 1, 2023

dmahurin commented Jun 1, 2023

JohannesGaessler commented Jun 1, 2023

ggerganov commented Jun 10, 2023

shakfu commented Nov 5, 2024 • edited Loading

shakfu commented Nov 5, 2024 •

edited

Loading