Kudos on a great job! Need a little help with BLAS #32

regstuff · 2023-04-06T07:16:16Z

Let me first congratulate everyone working on this for:

Python bindings for llama.cpp
Making them compatible with openai's api
Superb documentation!

Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.
I've been using kobold.cpp, and they have a BLAS flag at compile time which enables BLAS. It cuts down the prompt loading time by 3-4X. This is a major factor in handling longer prompts and chat-style messages.

P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?

abetlen · 2023-04-06T12:54:15Z

Thank you!

P.S - Was also wondering what the difference is between create_embedding(input) and embed(input)?

Just the return signature, create_embedding returns an object identical to openai.Embeddings.create whereas embed just returns a list of floats.

Was wondering if anyone can help me get this working with BLAS? Right now when the model loads, I see BLAS=0.

At the moment installing this library is equivalent to building llama.cpp as a shared library with cmake with more or less the default args. There's an issue at the moment for loading a custom shared library version but I don't think that's the right solution to configuration.

I think we could support e.g. setting environment variables before installation to force certain features. Do you mind installing llama.cpp standalone for me with BLAS support and telling me the process so I can add something to the setup.py. Thanks

regstuff · 2023-04-06T16:00:29Z

I did make LLAMA_OPENBLAS=1

ghost · 2023-04-07T02:06:28Z

I got OpenBLAS working with llama-cpp-python, though it requires modification to the llama.cpp CMakeLists.txt file. This provides a nice performance boost during prompt ingestion compared to builds without OpenBLAS.

This was tested on Ubuntu 22 and I'll leave the exercise of getting this configurable and working on all platforms to the devs 😀

In CMakeLists.txt add after project(llama_cpp)

set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)

set(LLAMA_OPENBLAS ON)

In vendor/llama.cpp/CMakeLists.txt replace line 247 with:

target_link_libraries(llama PRIVATE ggml ${LLAMA_EXTRA_LIBS} openblas)

For generating the shared llama.cpp library -lopenblas was required to get the symbols to properly appear in the .so file ggerganov/llama.cpp#412 (comment). This is not required when generating the regular executable version of llama.cpp.

ghost · 2023-04-08T15:53:15Z

I got CMake OpenBlas support into upstream llama.cpp ggerganov/llama.cpp@f2d1c47 but it looks like you guys jumped the gun on me and switched to using the Makefile to build llama.cpp.

Since the Makefile is being used we can easily enable OpenBlas support using an environment variable (and I believe there are ways to append an argument to pip install so that we can send flags over to the installer). Or perhaps the setup script could detect if the user has OpenBlas installed and automatically enable it if that's the case.

abetlen · 2023-04-08T16:25:24Z

@eiery I think the environment variable approach is the way to go, we can document some common settings in the README and ask the user to pip install --force-reinstall --ignore-installed llama-cpp-python

ghost · 2023-04-09T01:14:52Z

Great! For the record the correct command to get OpenBlas working in the pip install is:

LLAMA_OPENBLAS=on pip install --force-reinstall --ignore-installed --no-cache-dir llama-cpp-python

We need to clear the cache as well else pip just uses the cached build and does not recompile llama.cpp. Feel free to add this to the README.

Now to get this up into oobabooga...

gjmulder · 2023-04-15T13:00:30Z

I can't get BLAS to enable:

$ rm -rf _skbuild/

$ LLAMA_OPENBLAS=on pip install --force-reinstall --ignore-installed --no-cache-dir llama-cpp-python
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.33.tar.gz (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 18.5 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting typing-extensions>=4.5.0
  Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.1.33-cp310-cp310-linux_x86_64.whl size=136284 sha256=01c535e6d8a3245619b03971ed647dd657c09c069c6c0d12904f86b836d3899f
  Stored in directory: /data/tmp/pip-ephem-wheel-cache-j_6kc3tv/wheels/7d/56/a8/1f25f650cc0e65111f077cc49454a388ee6ae62de56236ee79
Successfully built llama-cpp-python
Installing collected packages: typing-extensions, llama-cpp-python
Successfully installed llama-cpp-python-0.1.33 typing-extensions-4.5.0

$ python3 -m llama_cpp.server
llama.cpp: loading model from /data/llama/alpaca-13B-ggml/ggml-model-q4_0.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = ggmf v1 (old version with no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 7945693.73 KB
llama_model_load_internal: mem required  = 9807.47 MB (+ 1608.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1600.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
INFO:     Started server process [917202]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

abetlen · 2023-04-15T13:20:25Z

Are you on Windows? I think the env variable passing only works for the Makefile builds which are currently only for Unix, not sure how to pass environment variables to cmake, maybe a change to the root CMakeLists.txt.

abetlen · 2023-04-15T16:19:06Z

@gjmulder also wonder if this is related ggerganov/llama.cpp#992

this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries

abetlen mentioned this issue Apr 6, 2023

Investigate using make instead of cmake to build shared library #20

Closed

ghost mentioned this issue Apr 9, 2023

Using llama.cpp, the entire context gets reprocessed each generation oobabooga/text-generation-webui#866

Closed

0xdevalias mentioned this issue Apr 11, 2023

Fix for MPS support on Apple Silicon oobabooga/text-generation-webui#393

Merged

abetlen closed this as completed Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kudos on a great job! Need a little help with BLAS #32

Kudos on a great job! Need a little help with BLAS #32

regstuff commented Apr 6, 2023 •

edited

Loading

abetlen commented Apr 6, 2023

regstuff commented Apr 6, 2023 •

edited

Loading

ghost commented Apr 7, 2023 •

edited by ghost

Loading

ghost commented Apr 8, 2023

abetlen commented Apr 8, 2023 •

edited

Loading

ghost commented Apr 9, 2023

gjmulder commented Apr 15, 2023

abetlen commented Apr 15, 2023

abetlen commented Apr 15, 2023

Kudos on a great job! Need a little help with BLAS #32

Kudos on a great job! Need a little help with BLAS #32

Comments

regstuff commented Apr 6, 2023 • edited Loading

abetlen commented Apr 6, 2023

regstuff commented Apr 6, 2023 • edited Loading

ghost commented Apr 7, 2023 • edited by ghost Loading

ghost commented Apr 8, 2023

abetlen commented Apr 8, 2023 • edited Loading

ghost commented Apr 9, 2023

gjmulder commented Apr 15, 2023

abetlen commented Apr 15, 2023

abetlen commented Apr 15, 2023

regstuff commented Apr 6, 2023 •

edited

Loading

regstuff commented Apr 6, 2023 •

edited

Loading

ghost commented Apr 7, 2023 •

edited by ghost

Loading

abetlen commented Apr 8, 2023 •

edited

Loading