Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest release crashes on start #903

Closed
softzer0 opened this issue Apr 12, 2023 · 6 comments · Fixed by #917
Closed

Latest release crashes on start #903

softzer0 opened this issue Apr 12, 2023 · 6 comments · Fixed by #917
Assignees

Comments

@softzer0
Copy link

softzer0 commented Apr 12, 2023

D:\Downloads\llama-master-8b67998-bin-win-avx-x64>main -m ggml-model-q4_1.bin -p "Building a website can be done in 10 simple steps:" -n 512
main: seed = 1681263282
llama.cpp: loading model from ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
LLAMA_ASSERT: D:\a\llama.cpp\llama.cpp\llama.cpp:830: false

I'm experiecing this error. Anyone knows what's the issue? I think this is a bug, since the one of the previous releases that doesn't have this problem is master-2663d2c.

@thestamp
Copy link

can confirm, i get the exact same error. Rolling back to linked release works

@funnbot
Copy link
Contributor

funnbot commented Apr 12, 2023

Same issue, introduced in #709
from:

static const char *llama_ftype_name(enum llama_ftype ftype) {
    switch (ftype) {
        case LLAMA_FTYPE_ALL_F32:     return "all F32";
        case LLAMA_FTYPE_MOSTLY_F16:  return "mostly F16";
        case LLAMA_FTYPE_MOSTLY_Q4_0: return "mostly Q4_0";
        case LLAMA_FTYPE_MOSTLY_Q4_1: return "mostly Q4_1";
        default: LLAMA_ASSERT(false);
    }
}

ftype for my q4_1 model is 4 when this function is called.
before that PR its still 4, just called the f16 hparam, so is this just an off by one issue?

This is a gptq model converted to q4_1, and interestingly, the convert-gptq-to-ggml.py script does do fout.write(struct.pack("i", 4)) when writing that...

Ah, so #801 removed checking for GPTQ models
- case 4: wtype = GGML_TYPE_Q4_1; vtype = GGML_TYPE_F16; break;,
since it switched to per layer types, meaning the f16 hparam hasn't mattered since that PR.

For the actual fix, I guess another llama_ftype could be added? LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 = 4

temp fix for anyone waiting

static const char *llama_ftype_name(enum llama_ftype ftype) {
   switch (ftype) {
       case LLAMA_FTYPE_ALL_F32:     return "all F32";
       case LLAMA_FTYPE_MOSTLY_F16:  return "mostly F16";
       case LLAMA_FTYPE_MOSTLY_Q4_0: return "mostly Q4_0";
       case LLAMA_FTYPE_MOSTLY_Q4_1: return "mostly Q4_1";
+      case 4: return "mostly Q4_1 and some f16";
       default: LLAMA_ASSERT(false);
   }
}

There is no negative effect from just bypassing this assertion, the f16/ftype hparam isn’t used anymore.

@TheBloke
Copy link
Contributor

Yes I am having this issue as well, with GPTQ models

@wbpxre150
Copy link
Contributor

If you comment out default: LLAMA_ASSERT(false); then it loads just fine. I tested this earlier today. Not sure what the consequences of that are, as it only crashed for me loading the Koala model, using gpt4all 7B model it is fine.
Commenting the above line out, let it load and it started generating a response, however it was so slow I gave up testing it and moved onto something else. I think its a RAM limitation on this laptop, it cannot load larger models than 7B.

@TheBloke
Copy link
Contributor

For now I just rolled back to the commit before with:

$ git checkout 2663d2c6784ad7b77998c6874df25648d597f74b && make clean && make

@sw
Copy link
Contributor

sw commented Apr 12, 2023

My apologies, I assumed that the "4" format was no longer supported by the new loader code in #801, that's why I didn't make a value in enum llama_ftype for it and removed it from the switch. I'll look into it later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants