-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kudos on a great job! Need a little help with BLAS #32
Comments
Thank you!
Just the return signature, create_embedding returns an object identical to
At the moment installing this library is equivalent to building I think we could support e.g. setting environment variables before installation to force certain features. Do you mind installing |
I did |
I got OpenBLAS working with llama-cpp-python, though it requires modification to the llama.cpp CMakeLists.txt file. This provides a nice performance boost during prompt ingestion compared to builds without OpenBLAS. This was tested on Ubuntu 22 and I'll leave the exercise of getting this configurable and working on all platforms to the devs 😀 In CMakeLists.txt add after
In vendor/llama.cpp/CMakeLists.txt replace line 247 with:
For generating the shared llama.cpp library |
I got CMake OpenBlas support into upstream llama.cpp ggerganov/llama.cpp@f2d1c47 but it looks like you guys jumped the gun on me and switched to using the Makefile to build llama.cpp. Since the Makefile is being used we can easily enable OpenBlas support using an environment variable (and I believe there are ways to append an argument to pip install so that we can send flags over to the installer). Or perhaps the setup script could detect if the user has OpenBlas installed and automatically enable it if that's the case. |
@eiery I think the environment variable approach is the way to go, we can document some common settings in the README and ask the user to |
Great! For the record the correct command to get OpenBlas working in the pip install is:
We need to clear the cache as well else pip just uses the cached build and does not recompile llama.cpp. Feel free to add this to the README. Now to get this up into oobabooga... |
I can't get BLAS to enable:
|
Are you on Windows? I think the env variable passing only works for the Makefile builds which are currently only for Unix, not sure how to pass environment variables to cmake, maybe a change to the root CMakeLists.txt. |
@gjmulder also wonder if this is related ggerganov/llama.cpp#992 |
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
Let me first congratulate everyone working on this for:
Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.
I've been using kobold.cpp, and they have a BLAS flag at compile time which enables BLAS. It cuts down the prompt loading time by 3-4X. This is a major factor in handling longer prompts and chat-style messages.
P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?
The text was updated successfully, but these errors were encountered: