Swiftui metal update #1

bachittle · 2023-10-06T01:30:44Z

NOTE: this merge request is more so for documenting purposes, but the actual merge will not occur.

trying to update to latest llama (the batch update), encountering some issues.

Some models are able to load successfully, but they crash when hitting the llama_decode function. This is the following error message and call stack, that is reproducible with multiple models:

Some other models do not even load at all, such as the starcoder-1b model. It states that there is an "invalid character", maybe some sort of utf-8 issue?

For now will just stick to using swiftui_metal with the older version of llama.cpp until a solution is found.

* Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per ggerganov#2494 (comment) * Update README.md

…rganov#3206) * llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>

…gerganov#3401) * llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics

…ov#3412) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU

* vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

the shapes for init model of gqa models was wrong

* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build

) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.

@jploski

* Work on the BPE tokenizer Tokenizer tests work for Falcon-7B * Try to fix build problem * Fix debug assertion failure * Fix MSVC Unicode BOM problem * Cleanup and an improvement * Fix compiler warning * Cleanup * Test doesn't work over the full range of Unicodes * Update .gitignore and Makefile * Another Makefile rule * Testing Aquila * Moving byte decoding back to `token_to_piece` ... ... because everyone is using it. * Guarding some unusable code pathes * Streamlining code and adding some more assertions Important change: I'm classifying added tokens as control tokens now for BPE. * Adding a comment * Adding another assertion * Fixed vocabulary guarding assertions * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fix PR for recent change * Fix PR for recent change * Fix PR for recent change * Fix for compiler warning * Fixes for more compiler warnings * Remove unused code * Fix initialization of static maps * Add scores and token types back, adapt gptneox * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update unicode.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Ported Starcoder and added some assertions * Fix coding style * Apply @jploski 's fix for missing tokens --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…d by the processor (ggerganov#3273) * fix LLAMA_NATIVE * syntax * alternate implementation * my eyes must be getting bad... * set cmake LLAMA_NATIVE=ON by default * march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc * revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile * remove -DLLAMA_MPI=ON --------- Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>

so it can be scaled further before creating a context.

* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API

* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.

…ng intrinsics (ggerganov#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai> --------- Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>

Fix small typo

* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf

* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…erganov#3299) Use local GGUF package when possible in Baichuan converter

ggml-ci

…#4056)

* add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change

* Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers

Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Co-authored-by: Bernhard Gstrein <gstrein@cs.uni-freiburg.de>

…#4040) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time

* llama : fix data units ggml-ci * Revert "llama : fix data units" This reverts commit f5feac8. * llama : disambiguate data units ggml-ci

* Fix ggerganov#4017 * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* finetune : zero the loraB initial vectors Without this, the first iteration is starting out far from the base model, instead of exactly on it. Zeroing loraB is what the paper recommends. loralib also zeroes at least one of the init vector pairs (though it departs from the paper in using a different distribution for the other vector, in some cases). * tabs to spaces * Use ggml_set_zero instead of adding a new function

…anov#4079) * Remove logically superfluous assertions and order by dimension * Use cblas_sgemm() to implement ggml_compute_forward_out_prod() * Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace * Add openBLAS support for sgemm() in compute_forward_out_prod()

* llama : add functions to get the model's metadata * format -> std::to_string * better documentation

ggerganov#4074) - introduces help entry for the argument - cuts '--gpu-layers' form in order to simplify usage and documentation. Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>

Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>

…gerganov#4069)

* logging: improve escaping in yaml output * logging: include review feedback

Falcon HF compatibility

…amas to load (ggerganov#4089) Co-authored-by: Don Mahurin <@>

* build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

a10y and others added 30 commits September 29, 2023 14:15

readme : add link to grammars app (ggerganov#3388)

569550d

* Add link to grammars app per @ggernagov suggestion Adding a sentence in the Grammars section of README to point to grammar app, per ggerganov#2494 (comment) * Update README.md

readme : update hot topics + model links (ggerganov#3399)

0a4a4a0

llama : quantize up to 31% faster on Linux and Windows with mmap (gge…

2777a84

…rganov#3206) * llama : enable mmap in quantize on Linux -> 31% faster * also enable mmap on Windows --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

train : fix KQ_pos allocation (ggerganov#3392)

bc34dd4

* train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>

llama.cpp : add documentation about rope_freq_base and scale values (g…

40e07a6

…gerganov#3401) * llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics

ggml-cuda : perform cublas mat mul of quantized types as f16 (ggergan…

f5ef5cf

…ov#3412) * ggml-cuda : perform cublas matrix multiplication of quantized types as fp16 * rename CC_TURING to CC_VOLTA * disable fp16 mat mul completely with multi GPU

docker : ignore Git files (ggerganov#3314)

ea55295

cmake : fix transient definitions in find pkg (ggerganov#3411)

095231d

metal : set log callback before initializing (ggerganov#3427)

a847676

finetune : fix ggerganov#3404 (ggerganov#3437)

a03ce38

the shapes for init model of gqa models was wrong

cmake : make CUDA flags more similar to the Makefile (ggerganov#3420)

9476b01

* cmake : fix misuse of cxx_flags * cmake : make CUDA flags more similar to the Makefile * cmake : fix MSVC build

gguf : general usability improvements (ggerganov#3409)

0fe3210

gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408)

29a404a

CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402

665018c

) Broadcast src0 into src1 across dimensions 2 and 3 when needed. This is required for models that use GQA.

cmake : increase minimum version for add_link_options (ggerganov#3444)

e78f0b0

convert : fix vocab size when not defined in hparams (ggerganov#3421)

1c84003

metal : alibi for arbitrary number of heads (ggerganov#3426)

f56e1ba

llama : expose model's rope_freq_scale in the API (ggerganov#3418)

48be797

so it can be scaled further before creating a context.

llama : fix session saving/loading (ggerganov#3400)

ac2219f

* llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API

main : consistent prefix/suffix coloring (ggerganov#3425)

8186242

* Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.

finetune : readme fix typo (ggerganov#3465)

f72f8f2

Fix small typo

llm : add Refact model (ggerganov#3329)

f8c90cd

* add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml : fix build after ggerganov#3329

0d152b3

readme : add project status link

beabc8c

convert : fix Baichuan2 models by using vocab size in config.json (gg…

019ba1d

…erganov#3299) Use local GGUF package when possible in Baichuan converter

ggerganov and others added 29 commits November 13, 2023 16:55

ggml : sync (im2col, GPU conv, 32-bit arm compat) (ggerganov#4060)

3d68f36

ggml-ci

llava : fix regression for square images in ggerganov#3613 (ggerganov…

bd90eca

…#4056)

convert.py: also look for plain model.safetensors (ggerganov#4043)

b46d12f

* add safetensors to convert.py help message * Check for single-file safetensors model * Update convert.py "model" option help message * revert convert.py help message change

stablelm : StableLM support (ggerganov#3586)

36eed0c

* Add support for stablelm-3b-4e1t * Supports GPU offloading of (n-1) layers

Fix MacOS Sonoma model quantization (ggerganov#4052)

6bb4908

Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggml-cuda : increase max graph size (ggerganov#4084)

1cf2850

llama : restore prefix space in llama tokenizer (ggerganov#4081)

a6fc554

gguf : fix potential infinite loops while parsing (ggerganov#4100)

8da4627

Co-authored-by: Bernhard Gstrein <gstrein@cs.uni-freiburg.de>

Respect tokenizer.ggml.add_bos_token value when tokenizing (ggerganov…

91f6499

…#4040) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time

llama : fix data units (ggerganov#4101)

4f447a4

* llama : fix data units ggml-ci * Revert "llama : fix data units" This reverts commit f5feac8. * llama : disambiguate data units ggml-ci

llama : add functions to get the model's metadata (ggerganov#4013)

e85bb1a

* llama : add functions to get the model's metadata * format -> std::to_string * better documentation

py : remove superfluous import statements (ggerganov#4076)

f7d5e97

Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>

llava : fix compilation warning that fread return value is not used (g…

c7cce12

…gerganov#4069)

common : improve yaml log escaping (ggerganov#4080)

9e87ef6

* logging: improve escaping in yaml output * logging: include review feedback

py : Falcon HF compatibility (ggerganov#4104)

11173c9

Falcon HF compatibility

convert : use 'model' value if it exists. This allows karpathy/tinyll…

2ab0707

…amas to load (ggerganov#4089) Co-authored-by: Don Mahurin <@>

examples : add tokenize (ggerganov#4039)

2fa02b4

tokenize : fix trailing whitespace

5ad387e

llama : increase max nodes (ggerganov#4115)

bbecf3f

added O3, now has insufficient memory access

cd61854

Merge branch 'master' into swiftui_metal_update

f510cc1

begin sync with master

ce31d95

update to match latest code, new errors

a22264a

fixed it!

f002a2e

bachittle closed this Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swiftui metal update #1

Swiftui metal update #1

bachittle commented Oct 6, 2023 •

edited

Loading

Swiftui metal update #1

Swiftui metal update #1

Conversation

bachittle commented Oct 6, 2023 • edited Loading

bachittle commented Oct 6, 2023 •

edited

Loading