Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast /w new ggml & vicuna is not working. #1415

Closed
edp1096 opened this issue May 12, 2023 · 10 comments
Closed

CLBlast /w new ggml & vicuna is not working. #1415

edp1096 opened this issue May 12, 2023 · 10 comments

Comments

@edp1096
Copy link
Contributor

edp1096 commented May 12, 2023

Hello.

When new ggml using which wrapped clblast with vicuna prompt. main.exe shows weird response.
Characters after last ### Humans: are generated by main.exe

Both cpu & cuda versions works well.

The model file I tried is ggml-vic7b-q4_0.bin at https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/tree/main

D:\dev\pcbangstudio\workspace\my-llama\bin>main.exe -m ggml-vic7b-q4_0.bin -p "What is the largest city in Asia?" -f vicuna.txt
main: build = 529 (b9fd7ee)
main: seed  = 1683896754
llama.cpp: loading model from ggml-vic7b-q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  68.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 1026.00 MB per state)

Initializing CLBlast (First Run)...
Attempting to use: Platform=0, Device=0 (If invalid, program will crash)
Using Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 3060 Ti
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 What is the largest city in Asia?A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: Hello, Assistant.
### Assistant: Hello. How may I help you today?
### Human: Please tell me the largest city in Europe.
### Assistant: Sure. The largest city in Europe is Moscow, the capital of Russia.
### Human: socirijукук德...] Према applies sieук applied...] ['*/...] Camerук... Swift Swiftsom BradDir.. ~FIукFIукукук Quint...]..укFI Swiftук appliedукукFI...] Brad plane' Camer SwiftFI*/ momentук BradFI CamerFIffFI BradFICon; ~ planeFI Quintукук tmFI SwiftFIines...]ук CamerFI E below ah Eastук*/…<укFI FI...FIFIFIFI..FI Camer< below Eук... son Quint CamerFIFIукFIHIFIFIук
 Camer...] Camerукtm...FIotalук momentFI... Camer below... CamerFIFI*/FI SomFIFI......FI .FIFIук...] Quint Son...]FIFIttFI ~FIFI below ..FIFIFIFIFIFI Camer...FI... ITFIукFI momentFI*/FI EFI FI<FI...укFI...ук David...FIFIукFIun...]..FICon Old...укines…FI SwiftFI Hell*/FI<..FI BradFI QuintFIук .FI plane ~FIук... below CamerFIFI...ук...FI rootsff...
...FI ..... BradукFI momentFI… Camer David ourFIFI Son Camer Son EFIFItmHIffFIChainукFI tm...]FIFI IanaleinesFIFI Davidun.. QuintFI*/FI SwiftFI E Old...FIFI plane ~ ITFIFI Quint*/ plane HellукFIFI db ~' feel DavidFI.. Q..FI moment< Brad…*/
.... Camerff March ..<*/… planeFI BradffFI...FIFI our*/... IanFI Con<...]...FI Oldun SonFIConFI
*/ momentff below Brad
 Sixук E…*/укук*/...FIFIChain ..ук... Quint below... Hell... ~...] K*/ moment BradFIFI Quint'FI........ Camer*/ff...….....*/ук our plane E…... Con, Thomas Swift..
*/lrpring Springer Earук.~pringук Ear [UL...] ah Claudeetal ['...] Springer Quintción...,ук Springer..... Stupringук...pring...]...]...] Ear Dur Quint.~...]...] Ear......]...]...] alt Ear...] forward.~.........] Ken...]...]...]UL...] For...] Sau For...] Ear...]...] [ ['...]pring...]... —.....]... [pring.....]...]...]FI Quint plant...]...]......] Down...,...] er...)...]...].....]...] b —... — [' Quint...]...]...]UL...] DevHI...]...]...]...URIUL...]...]UL...] — Plant ahук...]...].....]pring...........]Dev'.~pring...]..., [...] Ken...]UL across QuintFI For...]...]...]pring a ha... Down... Ken...]......]...]...]... plant....~pringUL Tim...,....]......) ['... across.......] across^C
D:\dev\pcbangstudio\workspace\my-llama\bin>
@swittk
Copy link
Contributor

swittk commented May 12, 2023

Can confirm; latest CLBlast + Accelerate (I'm on Mac OS) build shows weird responses for me too regardless of model (tried q4_0 and q5_0 of WizardLM and Vicuna; all showed gibberish).
On the flipside, at least using plain CPU is extremely fast now though 😅.

@FNsi
Copy link
Contributor

FNsi commented May 12, 2023

Can confirm; latest CLBlast + Accelerate (I'm on Mac OS) build shows weird responses for me too regardless of model (tried q4_0 and q5_0 of WizardLM and Vicuna; all showed gibberish).

On the flipside, at least using plain CPU is extremely fast now though 😅.

+1. Only if there's one single speaker at the initial prompts let it work. Currently need manually type the reserve prompt to start conversion and let it work.

@skidd-level-100
Copy link

Same here on alpaca + WizardLM same issue, on higher ones like q8_0 it works good, but if you give it complex input then it spews this trash! (unless q8,f16,f32,mabey q5*) openCL + nvidia linux

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 12, 2023

It may have something to do with the new quantization format.

@ggerganov
Copy link
Owner

I forgot to update the OpenCL kernels

See the CUDA changes here for reference: b9fd7ee#diff-66b17223e8ba54054fb2600ecbd31107f8b917bac36c7f3789811b0f0e9802a1L83-L106

I'll try to do it, but I don't have CLBlast setup, so hoping somebody else to propose a fix

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 12, 2023

@ggerganov I will attempt it.

@SlyEcho
Copy link
Collaborator

SlyEcho commented May 13, 2023

Can you check again with the latest fixes?

@swittk
Copy link
Contributor

swittk commented May 13, 2023

Works fine for me 😄

@kurnevsky
Copy link
Contributor

kurnevsky commented May 13, 2023

It's still broken on some prompts, but it's the first time I'm trying to use clblast - maybe it was always broken :)

Here is an example:
llama-cpp -t 32 -m wizard-vicuna-13B.ggml.q8_0.bin --color -c 2048 --instruct
And then ask:
Write a sci-fi story about snowboarding. There should be an evil AI in there.

With clblast it produces garbage while without it works fine.

@edp1096
Copy link
Contributor Author

edp1096 commented May 13, 2023

@SlyEcho Sorry for late. It works great! Thank you and closing.

@edp1096 edp1096 closed this as completed May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants