-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama cpp low level python bindings #1660
base: master
Are you sure you want to change the base?
Conversation
Has some too many newline issues so WIP (Update) Fixed too many newlines, now onto args. Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
… ignore eos, add no-mmap, fixed 1 character echo too much bug
… class default value
c8186ab
to
93278f8
Compare
tabs replaced and trailing spaces removed in all commits (forced push) to pass the editor check |
Conversely that will also mean that a lot of people will be angry if you do something that breaks the Python bindings though. |
Not sure about this - I see the positives, but I'm worried that it will be too difficult for me to maintain Python code Also, I get the impression that the |
I agree that there will be double the work to maintain both cpp and python bindings, unless the latter can be automated (but that is quite difficult in practice even with something like binder). It is better to specify a slower-moving (higher-level?) api (perhaps as a result of the llamax effort) and then different python wrappers can implement it. There certainly seems to be a good number of them. I myself was working on one effort: llamalib which consisted of developing three thin compiled python3 llama.cpp wrappers of simultaneously (using pybind11, nanobind, and cython).. with the initial intent of providing an alternative compiled backend to While I made some decent progress, it was non-too-productive updating three wrappers at once against the frenetic pace of this project so I spunoff the cython wrapper: cyllama which I am currently developing and trying to keep in sync with bleeding-edge llama.cpp changes. In any case, well done to @abetlen for the stability / feature coverage provided by |
Background/rationale:
This pull request addresses #82 and #1156, bringing the low level python ctypes binding into llama.cpp. This should hopefully help reduce python binding fragmentation, and help broaden llama.cpp development. The use of python for examples and main wrappers is a pattern used in other related projects, such as rwkv.cpp and bert.cpp.
The ctypes python binding commits are from @abetlen / llama-python-cpp. Only the commits relevant for the low level bindings are included. Other commits such as hight level module or the server module are excluded. The remaining commits have been cleaned up some for clarity.
The python bindings can allow equivalent functionality of the bash scripts and main.cpp. Though the primary purpose is to get better alignment and widen the development community as python is a very common language in this field.
Having supported low level python bindings should not put any significant burden on c++ developers. As the python bindings become widely used, there will be many interested in keeping them up to date.
Use:
cmake -D BUILD_SHARED_LIBS=ON .
Chat.py is roughly equivalent to chat-13B.sh
MODEL=./models/llama-7B/ggml-model.bin python3 examples/Chat.py
low_level_api_chat_cpp.py is similar in functionality to main.cpp.
python3 examples/low_level_api_chat_cpp.py --model ./models/llama-7B/ggml-model.bin -b 1024 -i -r "User:" -f prompts/chat-with-bob.txt
low_level_api_chat_llama.py is simplified chat example.