Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama cpp low level python bindings #1660

Open
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

dmahurin
Copy link
Contributor

@dmahurin dmahurin commented Jun 1, 2023

Background/rationale:

This pull request addresses #82 and #1156, bringing the low level python ctypes binding into llama.cpp. This should hopefully help reduce python binding fragmentation, and help broaden llama.cpp development. The use of python for examples and main wrappers is a pattern used in other related projects, such as rwkv.cpp and bert.cpp.

The ctypes python binding commits are from @abetlen / llama-python-cpp. Only the commits relevant for the low level bindings are included. Other commits such as hight level module or the server module are excluded. The remaining commits have been cleaned up some for clarity.

The python bindings can allow equivalent functionality of the bash scripts and main.cpp. Though the primary purpose is to get better alignment and widen the development community as python is a very common language in this field.

Having supported low level python bindings should not put any significant burden on c++ developers. As the python bindings become widely used, there will be many interested in keeping them up to date.

Use:

cmake -D BUILD_SHARED_LIBS=ON .

Chat.py is roughly equivalent to chat-13B.sh

MODEL=./models/llama-7B/ggml-model.bin python3 examples/Chat.py

low_level_api_chat_cpp.py is similar in functionality to main.cpp.

python3 examples/low_level_api_chat_cpp.py --model ./models/llama-7B/ggml-model.bin -b 1024 -i -r "User:" -f prompts/chat-with-bob.txt

low_level_api_chat_llama.py is simplified chat example.

abetlen and others added 30 commits May 31, 2023 15:16
Has some too many newline issues so WIP

(Update) Fixed too many newlines, now onto args.

Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
… ignore eos, add no-mmap, fixed 1 character echo too much bug
@dmahurin dmahurin force-pushed the llama-cpp-python-low-level branch from c8186ab to 93278f8 Compare June 1, 2023 13:10
@dmahurin
Copy link
Contributor Author

dmahurin commented Jun 1, 2023

tabs replaced and trailing spaces removed in all commits (forced push) to pass the editor check

@JohannesGaessler
Copy link
Collaborator

Having supported low level python bindings should not put any significant burden on c++ developers. As the python bindings become widely used, there will be many interested in keeping them up to date.

Conversely that will also mean that a lot of people will be angry if you do something that breaks the Python bindings though.

@ggerganov
Copy link
Owner

Not sure about this - I see the positives, but I'm worried that it will be too difficult for me to maintain Python code
Maybe at some later stage we can provide this API, but at the moment it will be a big burden. Open to suggestions though

Also, I get the impression that the llama-cpp-python project is in a pretty good shape and well maintained. I guess people can use that? Is there anything we can do to support it from llama.cpp side?

@shakfu
Copy link

shakfu commented Nov 5, 2024

@ggerganov

Not sure about this - I see the positives, but I'm worried that it will be too difficult for me to maintain Python code
Maybe at some later stage we can provide this API, but at the moment it will be a big burden. Open to suggestions though

I agree that there will be double the work to maintain both cpp and python bindings, unless the latter can be automated (but that is quite difficult in practice even with something like binder). It is better to specify a slower-moving (higher-level?) api (perhaps as a result of the llamax effort) and then different python wrappers can implement it. There certainly seems to be a good number of them.

I myself was working on one effort: llamalib which consisted of developing three thin compiled python3 llama.cpp wrappers of simultaneously (using pybind11, nanobind, and cython).. with the initial intent of providing an alternative compiled backend to llama-cpp-python instead of ctypes.

While I made some decent progress, it was non-too-productive updating three wrappers at once against the frenetic pace of this project so I spunoff the cython wrapper: cyllama which I am currently developing and trying to keep in sync with bleeding-edge llama.cpp changes.

In any case, well done to @abetlen for the stability / feature coverage provided by python-llama-cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants