CLBlast /w new ggml & vicuna is not working. #1415

edp1096 · 2023-05-12T13:14:28Z

Hello.

When new ggml using which wrapped clblast with vicuna prompt. main.exe shows weird response.
Characters after last ### Humans: are generated by main.exe

Both cpu & cuda versions works well.

The model file I tried is ggml-vic7b-q4_0.bin at https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/tree/main

D:\dev\pcbangstudio\workspace\my-llama\bin>main.exe -m ggml-vic7b-q4_0.bin -p "What is the largest city in Asia?" -f vicuna.txt
main: build = 529 (b9fd7ee)
main: seed  = 1683896754
llama.cpp: loading model from ggml-vic7b-q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  68.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 1026.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 3060 Ti
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 What is the largest city in Asia?A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: Hello, Assistant.
### Assistant: Hello. How may I help you today?
### Human: Please tell me the largest city in Europe.
### Assistant: Sure. The largest city in Europe is Moscow, the capital of Russia.
### Human: socirijукук德...] Према applies sieук applied...] ['*/...] Camerук... Swift Swiftsom BradDir.. ~FIукFIукукук Quint...]..укFI Swiftук appliedукукFI...] Brad plane' Camer SwiftFI*/ momentук BradFI CamerFIffFI BradFICon; ~ planeFI Quintукук tmFI SwiftFIines...]ук CamerFI E below ah Eastук*/…<укFI FI...FIFIFIFI..FI Camer< below Eук... son Quint CamerFIFIукFIHIFIFIук
 Camer...] Camerукtm...FIotalук momentFI... Camer below... CamerFIFI*/FI SomFIFI......FI .FIFIук...] Quint Son...]FIFIttFI ~FIFI below ..FIFIFIFIFIFI Camer...FI... ITFIукFI momentFI*/FI EFI FI<FI...укFI...ук David...FIFIукFIun...]..FICon Old...укines…FI SwiftFI Hell*/FI<..FI BradFI QuintFIук .FI plane ~FIук... below CamerFIFI...ук...FI rootsff...
...FI ..... BradукFI momentFI… Camer David ourFIFI Son Camer Son EFIFItmHIffFIChainукFI tm...]FIFI IanaleinesFIFI Davidun.. QuintFI*/FI SwiftFI E Old...FIFI plane ~ ITFIFI Quint*/ plane HellукFIFI db ~' feel DavidFI.. Q..FI moment< Brad…*/
.... Camerff March ..<*/… planeFI BradffFI...FIFI our*/... IanFI Con<...]...FI Oldun SonFIConFI
*/ momentff below Brad
 Sixук E…*/укук*/...FIFIChain ..ук... Quint below... Hell... ~...] K*/ moment BradFIFI Quint'FI........ Camer*/ff...….....*/ук our plane E…... Con, Thomas Swift..
*/lrpring Springer Earук.~pringук Ear [UL...] ah Claudeetal ['...] Springer Quintción...,ук Springer..... Stupringук...pring...]...]...] Ear Dur Quint.~...]...] Ear......]...]...] alt Ear...] forward.~.........] Ken...]...]...]UL...] For...] Sau For...] Ear...]...] [ ['...]pring...]... —.....]... [pring.....]...]...]FI Quint plant...]...]......] Down...,...] er...)...]...].....]...] b —... — [' Quint...]...]...]UL...] DevHI...]...]...]...URIUL...]...]UL...] — Plant ahук...]...].....]pring...........]Dev'.~pring...]..., [...] Ken...]UL across QuintFI For...]...]...]pring a ha... Down... Ken...]......]...]...]... plant....~pringUL Tim...,....]......) ['... across.......] across^C
D:\dev\pcbangstudio\workspace\my-llama\bin>

The text was updated successfully, but these errors were encountered:

swittk · 2023-05-12T14:15:45Z

Can confirm; latest CLBlast + Accelerate (I'm on Mac OS) build shows weird responses for me too regardless of model (tried q4_0 and q5_0 of WizardLM and Vicuna; all showed gibberish).
On the flipside, at least using plain CPU is extremely fast now though 😅.

FNsi · 2023-05-12T15:45:10Z

Can confirm; latest CLBlast + Accelerate (I'm on Mac OS) build shows weird responses for me too regardless of model (tried q4_0 and q5_0 of WizardLM and Vicuna; all showed gibberish).

On the flipside, at least using plain CPU is extremely fast now though 😅.

+1. Only if there's one single speaker at the initial prompts let it work. Currently need manually type the reserve prompt to start conversion and let it work.

skidd-level-100 · 2023-05-12T17:00:02Z

Same here on alpaca + WizardLM same issue, on higher ones like q8_0 it works good, but if you give it complex input then it spews this trash! (unless q8,f16,f32,mabey q5*) openCL + nvidia linux

SlyEcho · 2023-05-12T20:46:13Z

It may have something to do with the new quantization format.

ggerganov · 2023-05-12T20:50:20Z

I forgot to update the OpenCL kernels

See the CUDA changes here for reference: b9fd7ee#diff-66b17223e8ba54054fb2600ecbd31107f8b917bac36c7f3789811b0f0e9802a1L83-L106

I'll try to do it, but I don't have CLBlast setup, so hoping somebody else to propose a fix

SlyEcho · 2023-05-12T20:56:26Z

@ggerganov I will attempt it.

SlyEcho · 2023-05-13T11:13:06Z

Can you check again with the latest fixes?

swittk · 2023-05-13T11:20:56Z

Works fine for me 😄

kurnevsky · 2023-05-13T12:39:48Z

It's still broken on some prompts, but it's the first time I'm trying to use clblast - maybe it was always broken :)

Here is an example:
llama-cpp -t 32 -m wizard-vicuna-13B.ggml.q8_0.bin --color -c 2048 --instruct
And then ask:
Write a sci-fi story about snowboarding. There should be an evil AI in there.

With clblast it produces garbage while without it works fine.

edp1096 · 2023-05-13T14:39:49Z

@SlyEcho Sorry for late. It works great! Thank you and closing.

SlyEcho mentioned this issue May 12, 2023

Fix OpenCL kernels for the new formats #1422

Merged

edp1096 closed this as completed May 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLBlast /w new ggml & vicuna is not working. #1415

CLBlast /w new ggml & vicuna is not working. #1415

edp1096 commented May 12, 2023 •

edited

Loading

swittk commented May 12, 2023

FNsi commented May 12, 2023 •

edited

Loading

skidd-level-100 commented May 12, 2023

SlyEcho commented May 12, 2023

ggerganov commented May 12, 2023

SlyEcho commented May 12, 2023

SlyEcho commented May 13, 2023

swittk commented May 13, 2023

kurnevsky commented May 13, 2023 •

edited

Loading

edp1096 commented May 13, 2023 •

edited

Loading

CLBlast /w new ggml & vicuna is not working. #1415

CLBlast /w new ggml & vicuna is not working. #1415

Comments

edp1096 commented May 12, 2023 • edited Loading

swittk commented May 12, 2023

FNsi commented May 12, 2023 • edited Loading

skidd-level-100 commented May 12, 2023

SlyEcho commented May 12, 2023

ggerganov commented May 12, 2023

SlyEcho commented May 12, 2023

SlyEcho commented May 13, 2023

swittk commented May 13, 2023

kurnevsky commented May 13, 2023 • edited Loading

edp1096 commented May 13, 2023 • edited Loading

edp1096 commented May 12, 2023 •

edited

Loading

FNsi commented May 12, 2023 •

edited

Loading

kurnevsky commented May 13, 2023 •

edited

Loading

edp1096 commented May 13, 2023 •

edited

Loading