Generalize convert scripts #3838

Galunid · 2023-10-28T17:30:01Z

A lot of code is duplicate between multiple convert scripts like gpt-neox, mpt, bloom, baichuan.
The usual flow of convert scripts is as follows;

Get parameters
Open model
Get tokenizer
Convert tensors
Write tensors

With open model/get tokenizer/convert tensors/write tensors either similar or identical between different scripts. I refactored the code so that Model class deals with steps 2-5 and allows for simple implementation of new models via inheritance, by overwriting methods such as set_gguf_parameters.

~~This is mostly a draft PR to gauge whether it's worth to invest time in this. Feel free to close it if it's not desired.~~

Supported models:

convert-bloom-hf-to-gguf.py
convert-baichuan-hf-to-gguf.py
convert-falcon-hf-to-gguf.py
convert-gptneox-hf-to-gguf.py
convert-mpt-hf-to-gguf.py
~~convert-persimmon-to-gguf.py~~
convert-refact-hf-to-gguf.py
convert-starcoder-hf-to-gguf.py

TODO:

Support endianness conversion
Yarn rope for baichuan models
Verify checksums pre merge

Checksum verified (again):

convert-bloom-hf-to-gguf.py
convert-baichuan-hf-to-gguf.py
convert-falcon-hf-to-gguf.py (needs change to tokenizer)
convert-gptneox-hf-to-gguf.py
convert-mpt-hf-to-gguf.py
~~convert-persimmon-to-gguf.py~~
convert-refact-hf-to-gguf.py
convert-starcoder-hf-to-gguf.py

Note:
When checking whether old script and the new one convert the file correctly (using checksum), make sure the same input file is used (model.safetensors, pytorch_model.bin). Generic script uses .safetensors when both files are present, but other conversion scripts prefer pytorch_model.bin

Green-Sky · 2023-10-28T20:58:23Z

Good initiative. I do however think we should remove the torch dependency, like the main convert.py.

TheBloke · 2023-10-28T21:20:31Z

A single convert.py script that can do all models would be awesome!

Galunid · 2023-10-28T21:52:42Z

I do however think we should remove the torch dependency

I don't think we can, since we depend on transformers, which depends on torch, we may as well use it. I vaguely recall there was a discussion in one of the PRs that we should keep dependency on transformers.

cebtenzzre · 2023-10-29T00:31:07Z

I vaguely recall there was a discussion in one of the PRs that we should keep dependency on transformers.

here: #3633 (comment)

If we merge this PR then I won't have to worry about inconsistencies like this cropping up (still unfixed) #3680 (comment)

Galunid · 2023-10-29T01:11:20Z

That's the one! Thanks

If we merge this PR then I won't have to worry about inconsistencies like this cropping up (still unfixed) #3680 (comment)

Pretty much my motivation

Galunid · 2023-10-29T02:30:02Z

Do we have some tool to view .gguf files?

Notes to self

Baichuan conversion is broken

$ md5sum ../Baichuan-7B/*.gguf
865cdc634964456c1c5bb3ebda707a62  ../Baichuan-7B/ggml-model-f16.gguf
b38af82e71388f76733a33ed59f2df98  ../Baichuan-7B/working.gguf

$ ls -la ../Baichuan-7B/*.gguf
-rw-r--r-- 1 root root 14003010016 Oct 29 02:07 ../Baichuan-7B/ggml-model-f16.gguf
-rw-r--r-- 1 root root 14003010016 Oct 29 01:22 ../Baichuan-7B/working.gguf

Failed start

$ ./main -m ../Baichuan-7B/ggml-model-f16.gguf 
Log start
main: build = 1434 (550b925)
main: built with cc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 for x86_64-linux-gnu
main: seed  = 1698545774
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6
llama_model_loader: loaded meta data with 18 key-value pairs and 0 tensors from ../Baichuan-7B/ggml-model-f16.gguf (version unknown)
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                baichuan.tensor_data_layout str     
llama_model_loader: - kv   3:                    baichuan.context_length u32     
llama_model_loader: - kv   4:                  baichuan.embedding_length u32     
llama_model_loader: - kv   5:                       baichuan.block_count u32     
llama_model_loader: - kv   6:               baichuan.feed_forward_length u32     
llama_model_loader: - kv   7:              baichuan.rope.dimension_count u32     
llama_model_loader: - kv   8:              baichuan.attention.head_count u32     
llama_model_loader: - kv   9:           baichuan.attention.head_count_kv u32     
llama_model_loader: - kv  10:  baichuan.attention.layer_norm_rms_epsilon f32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32     
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32     
llm_load_vocab: mismatch in special tokens definition ( 609/64000 vs 259/64000 ).
llm_load_print_meta: format           = unknown
llm_load_print_meta: arch             = baichuan
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 64000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = all F32 (guessed)
llm_load_print_meta: model params     = 0.00 B
llm_load_print_meta: model size       = 0.00 MiB (-nan BPW) 
llm_load_print_meta: general.name   = Baichuan-7B
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token  = 403 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.00 MB
llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'token_embd.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../Baichuan-7B/ggml-model-f16.gguf'
main: error: unable to load model

Successful start

$ ./main -m ../Baichuan-7B/working.gguf 
Log start
main: build = 1434 (550b925)
main: built with cc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 for x86_64-linux-gnu
main: seed  = 1698545871
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6
llama_model_loader: loaded meta data with 18 key-value pairs and 291 tensors from ../Baichuan-7B/working.gguf (version unknown)
llama_model_loader: - tensor    0:                token_embd.weight f16      [  4096, 64000,     1,     1 ]
llama_model_loader: - tensor    1:         blk.0.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    6:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    7:         blk.1.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    8:            blk.1.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    9:            blk.1.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   10:              blk.1.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   11:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   12:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   13:         blk.2.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   14:            blk.2.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   15:            blk.2.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   16:              blk.2.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   17:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   18:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   19:         blk.3.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   20:            blk.3.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   21:            blk.3.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   22:              blk.3.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   23:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   24:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   25:         blk.4.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   26:            blk.4.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   27:            blk.4.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   28:              blk.4.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   29:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   30:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   31:         blk.5.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   32:            blk.5.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   33:            blk.5.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   34:              blk.5.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   35:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   36:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   37:         blk.6.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   38:            blk.6.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   39:            blk.6.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   40:              blk.6.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   41:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   42:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   43:         blk.7.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   44:            blk.7.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   45:            blk.7.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   46:              blk.7.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   47:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   48:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   49:         blk.8.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   50:            blk.8.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   51:            blk.8.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   52:              blk.8.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   53:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   54:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   55:         blk.9.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   56:            blk.9.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   57:            blk.9.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   58:              blk.9.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   59:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   60:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   61:        blk.10.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   62:           blk.10.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   63:           blk.10.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   64:             blk.10.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   65:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   66:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   67:        blk.11.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   68:           blk.11.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   69:           blk.11.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   70:             blk.11.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   71:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   72:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   73:        blk.12.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   74:           blk.12.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   75:           blk.12.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   76:             blk.12.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   77:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   78:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   79:        blk.13.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   80:           blk.13.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   81:           blk.13.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   82:             blk.13.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   83:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   84:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   85:        blk.14.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   86:           blk.14.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   87:           blk.14.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   88:             blk.14.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   89:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   90:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   91:        blk.15.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   92:           blk.15.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   93:           blk.15.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   94:             blk.15.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   95:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   96:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   97:        blk.16.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   98:           blk.16.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   99:           blk.16.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  100:             blk.16.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  101:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  102:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  103:        blk.17.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  104:           blk.17.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  105:           blk.17.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  106:             blk.17.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  107:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  108:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  109:        blk.18.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  110:           blk.18.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  111:           blk.18.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  112:             blk.18.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  113:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  114:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  115:        blk.19.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  116:           blk.19.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  117:           blk.19.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  118:             blk.19.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  119:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  120:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  121:        blk.20.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  122:           blk.20.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  123:           blk.20.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  124:             blk.20.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  125:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  126:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  127:        blk.21.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  128:           blk.21.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  129:           blk.21.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  130:             blk.21.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  131:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  132:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  133:        blk.22.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  134:           blk.22.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  135:           blk.22.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  136:             blk.22.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  137:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  138:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  139:        blk.23.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  140:           blk.23.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  141:           blk.23.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  142:             blk.23.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  143:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  144:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  145:        blk.24.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  146:           blk.24.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  147:           blk.24.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  148:             blk.24.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  149:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  150:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  151:        blk.25.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  152:           blk.25.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  153:           blk.25.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  154:             blk.25.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  155:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  156:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  157:        blk.26.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  158:           blk.26.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  159:           blk.26.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  160:             blk.26.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  161:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  162:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  163:        blk.27.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  164:           blk.27.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  165:           blk.27.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  166:             blk.27.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  167:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  168:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  169:        blk.28.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  170:           blk.28.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  171:           blk.28.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  172:             blk.28.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  173:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  174:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  175:        blk.29.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  176:           blk.29.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  177:           blk.29.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  178:             blk.29.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  179:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  180:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  181:        blk.30.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  182:           blk.30.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  183:           blk.30.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  184:             blk.30.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  185:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  186:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  187:        blk.31.attn_output.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  188:           blk.31.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  189:           blk.31.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  190:             blk.31.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  191:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  192:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  193:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  194:                    output.weight f16      [  4096, 64000,     1,     1 ]
llama_model_loader: - tensor  195:              blk.0.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  196:              blk.0.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  197:              blk.0.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  198:              blk.1.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  199:              blk.1.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  200:              blk.1.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  201:              blk.2.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  202:              blk.2.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  203:              blk.2.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  204:              blk.3.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  205:              blk.3.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  206:              blk.3.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  207:              blk.4.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  208:              blk.4.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  209:              blk.4.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  210:              blk.5.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  211:              blk.5.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  212:              blk.5.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  213:              blk.6.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  214:              blk.6.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  215:              blk.6.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  216:              blk.7.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  217:              blk.7.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  218:              blk.7.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  219:              blk.8.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  220:              blk.8.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  221:              blk.8.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  222:              blk.9.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  223:              blk.9.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  224:              blk.9.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  225:             blk.10.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  226:             blk.10.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  227:             blk.10.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  228:             blk.11.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  229:             blk.11.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  230:             blk.11.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  231:             blk.12.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  232:             blk.12.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  233:             blk.12.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  234:             blk.13.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  235:             blk.13.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  236:             blk.13.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  237:             blk.14.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  238:             blk.14.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  239:             blk.14.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  240:             blk.15.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  241:             blk.15.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  242:             blk.15.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  243:             blk.16.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  244:             blk.16.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  245:             blk.16.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  246:             blk.17.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  247:             blk.17.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  248:             blk.17.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  249:             blk.18.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  250:             blk.18.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  251:             blk.18.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  252:             blk.19.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  253:             blk.19.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  254:             blk.19.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  255:             blk.20.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  256:             blk.20.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  257:             blk.20.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  258:             blk.21.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  259:             blk.21.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  260:             blk.21.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  261:             blk.22.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  262:             blk.22.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  263:             blk.22.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  264:             blk.23.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  265:             blk.23.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  266:             blk.23.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  267:             blk.24.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  268:             blk.24.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  269:             blk.24.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  270:             blk.25.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  271:             blk.25.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  272:             blk.25.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  273:             blk.26.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  274:             blk.26.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  275:             blk.26.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  276:             blk.27.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  277:             blk.27.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  278:             blk.27.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  279:             blk.28.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  280:             blk.28.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  281:             blk.28.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  282:             blk.29.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  283:             blk.29.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  284:             blk.29.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  285:             blk.30.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  286:             blk.30.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  287:             blk.30.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  288:             blk.31.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  289:             blk.31.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  290:             blk.31.attn_v.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                baichuan.tensor_data_layout str     
llama_model_loader: - kv   3:                    baichuan.context_length u32     
llama_model_loader: - kv   4:                  baichuan.embedding_length u32     
llama_model_loader: - kv   5:                       baichuan.block_count u32     
llama_model_loader: - kv   6:               baichuan.feed_forward_length u32     
llama_model_loader: - kv   7:              baichuan.rope.dimension_count u32     
llama_model_loader: - kv   8:              baichuan.attention.head_count u32     
llama_model_loader: - kv   9:           baichuan.attention.head_count_kv u32     
llama_model_loader: - kv  10:  baichuan.attention.layer_norm_rms_epsilon f32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32     
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32     
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
llm_load_vocab: mismatch in special tokens definition ( 609/64000 vs 259/64000 ).
llm_load_print_meta: format           = unknown
llm_load_print_meta: arch             = baichuan
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 64000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly F16 (guessed)
llm_load_print_meta: model params     = 7.00 B
llm_load_print_meta: model size       = 13.04 GiB (16.00 BPW) 
llm_load_print_meta: general.name   = Baichuan-7B
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token  = 403 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 13353.11 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size = 139.13 MB
llama_new_context_with_model: VRAM scratch buffer: 133.00 MB
llama_new_context_with_model: total VRAM used: 133.00 MB (model: 0.00 MB, context: 133.00 MB)

system_info: n_threads = 24 / 48 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


" and "what if we were to take this?

It's like tensors aren't there, but the model size suggests they are.

$ strings -n 10 working.gguf | head -n 300

gives the same results for both (compared via diff). --vocab-only gives the same result (check via md5sum)

cebtenzzre · 2023-10-29T03:04:12Z

Do we have some tool to view .gguf files?

The closest thing I know is https://github.com/huggingface/candle, which has a from_gguf function and Python bindings. But I don't know if it can even load Baichuan models.

cebtenzzre · 2023-10-29T03:56:31Z

It's like tensors aren't there, but the model size suggests they are.

Based on the hexdump it looks like gguf_writer.ti_data_count is zero when tensors are written. That number should be the number of tensors.

convert-generic.py

Galunid · 2023-10-31T02:48:34Z

Not sure if Persimmon (the old script) works, it won't work with this model at the very least: https://huggingface.co/adept/persimmon-8b-base

I'll have to implement that one from scratch.

Galunid · 2023-10-31T03:35:49Z

*-hf-to-gguf.py scripts work, with the exception of persimmon. It seems to need some external library and will be reworked. I'll go through the code tomorrow and see what can be cleaned up. I'll mark it as "Ready for review" then.

maddes8cht · 2023-11-06T23:21:19Z

I haven't converted as many models as theBloke, but over the last few weeks I've been in the process of converting models with "real" open source licenses on my huggingface account maddes8cht, some of which are somewhat neglected by theBloke. At the moment these are mainly Falcon and Mpt models (Apache2 licenses) (while mistral is also getting a lot of attention from theBloke at the moment. There is no need to duplicate work)

My experience with the new convert script on a falcon 7b model (https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-7b):
The conversion runs smoothly, and the quantization also.
The resulting q4_1 file works for me as expected, although i didn't do real comparisons with a constant seed in a previous converted model.
I will do some further checks tomorrow, also with falcon 40b and mpt 7b / 30b models.

Ask me if you would like me to test something specific for you.
I have my own "pipeline" to download, convert, quantize and upload models to huggingface in batches.

Galunid · 2023-11-07T06:29:50Z

@TheBloke sorry for that, long story short, the model converted ended up as big endian, not little endian.
@maddes8cht Thanks, please make sure you get the latest commits when testing (run git pull before converting models)

Galunid · 2023-11-07T07:54:25Z

convert-hf-to-gguf.py is the active file yeah?

Yup

I see it doesn't support Llama, but does support all the other formats. I assume Llama support is being worked on later - sorry, I've not had time to read through the whole thread so am not up-to-date with the plan.

Yes, I'd like this script to support Llama models in huggingface format in the near future.

One comment is that I would prefer the --outtype f16/f32 argument to 0 or 1. Also I assume --outtype will be needed for when q8_0 is added back also (not that I use that personally, but I guess others do)

It should be pretty simple change, I'll add it soon

EDIT:
Converted both falcon-7b and falcon-40b-instruct and got the same checksums (with new and old script) after updating tokenizer in the old one, so no need to test those

maddes8cht · 2023-11-07T09:22:31Z

Confirming the checksum.
Yes, i always do a git pull before my tests.
Now trying an mpt 7b model.

I take the license issue very seriously and concentrate on models with " true" open source licenses. Apart from Falcon, mistral and mpt, there's not much left. Even bloom has quirky restrictions in its licenses that are not compatible with the open source idea.
In this respect, I am particularly waiting for persimmon, which is also licensed as Apache 2.
What is the status of Persimmon?

Galunid · 2023-11-07T09:28:17Z

What is the status of Persimmon?

TBD in convert script and broken in llama.cpp (see #3837 (comment)). The current scripts only support the .tar version, not the huggingface one. Persimmon will be skipped in this PR (the current script will be left).

convert-hf-to-gguf.py

Galunid · 2023-11-08T02:45:15Z

On my end we are good to merge this, if there are no more comments. I verified that old and new checksums match for all the models.

maddes8cht · 2023-11-08T07:58:54Z

Looks fine for me.
Created a bunch of models without problems now, giving sam€ results as the old scripts.
thanks for your work!

Green-Sky · 2023-11-08T12:05:04Z

Requested @ggerganov , so he is aware of this pr

maddes8cht · 2023-11-22T21:33:26Z

As mentioned in #3293, the convert-gptneox-hf-to-gguf.py does produce a gguf files, but there is no code to run inference on those models.
Running gguf models with the architecture name gptneox will result in an error message

llm_load_tensors: ggml ctx size =    0.19 MiB
llm_load_tensors: using CUDA for GPU acceleration
error loading model: unknown architecture
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'H:\EleutherAI-gpt-neox-20b-gguf\EleutherAI-gpt-neox-20b-Q5_0.gguf'
main: error: unable to load model

So somehow i consider it a bug that the convert script even builds these models, as noone ever could be able to test if it does something correct, until someone reimplements the code in https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox to work with current Llama versions using gguf file format (because said "Examples" impelmentation only runs with ggml files produced by that examples own convert script, not the gguf files produced by this convert-hf-to-gguf.py).

* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py

Galunid added 2 commits October 26, 2023 15:43

Initial generic convert script

4823b9b

Remove comments

2220124

Galunid added 2 commits October 29, 2023 00:33

Make gguf_writer member of Model, rework tokenizer export

0ff2371

Add [UNTESTED] Baichuan support

8618b4e

Galunid added 2 commits October 29, 2023 02:05

Missing variable

989db34

Missing variable

550b925

cebtenzzre reviewed Oct 29, 2023

View reviewed changes

convert-generic.py Outdated Show resolved Hide resolved

Galunid added 4 commits October 29, 2023 20:00

Call add_tensor before write_* functions

443f7d5

MPT conversion fix

08918b7

Get rid of dumb print

3bb9844

Add Falcon support

0afa75a

Galunid force-pushed the generic-convert branch from fcae724 to 0afa75a Compare October 31, 2023 01:57

Galunid added 3 commits October 31, 2023 03:12

Add Starcoder and Refact

94ba1db

[Untested] Initial Persimmon support

6f6856c

Woops

b9c664a

Fix variable

0743f7a

cebtenzzre mentioned this pull request Oct 31, 2023

convert : restore Falcon vocab padding #3864

Closed

Add another alias to n_layers

dc3115f

Galunid marked this pull request as ready for review October 31, 2023 03:31

Galunid marked this pull request as draft October 31, 2023 03:32

store_true defaults to False, not None

7a3433b

Change ftype from int value to str value

73780f5

cebtenzzre reviewed Nov 7, 2023

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

Galunid added 2 commits November 7, 2023 23:14

Review fixes

88b0d9e

Rename variable

b714883

cebtenzzre reviewed Nov 7, 2023

View reviewed changes

convert-hf-to-gguf.py Outdated Show resolved Hide resolved

parse_args improvements

6a83bce

Galunid added 2 commits November 8, 2023 03:47

Remove outdated scripts

2862d16

Merge branch 'master' into generic-convert

c79c358

Galunid mentioned this pull request Nov 8, 2023

Align the tokenized result between deepseek coder python model and gguf model #3986

Closed

Green-Sky approved these changes Nov 8, 2023

View reviewed changes

Green-Sky requested a review from ggerganov November 8, 2023 12:04

Galunid mentioned this pull request Nov 8, 2023

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

Merged

Fix import path

2789ad9

ggerganov approved these changes Nov 9, 2023

View reviewed changes

Galunid merged commit a75fa57 into ggerganov:master Nov 9, 2023
6 checks passed

Galunid deleted the generic-convert branch November 9, 2023 10:09

Galunid mentioned this pull request Nov 18, 2023

Remove missed baichuan convert script #4127

Merged

This was referenced Nov 22, 2023

GPT-NeoX has only minimal inference support #3293

Closed

Support stableCode models (which seems to be gpt-neo-x that we can convert into gguf) #4174

Closed

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

scripts: Generalize convert scripts (ggerganov#3838)

6e1a29d

* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize convert scripts #3838

Generalize convert scripts #3838

Galunid commented Oct 28, 2023 •

edited

Loading

Green-Sky commented Oct 28, 2023

TheBloke commented Oct 28, 2023

Galunid commented Oct 28, 2023

cebtenzzre commented Oct 29, 2023 •

edited

Loading

Galunid commented Oct 29, 2023

Galunid commented Oct 29, 2023

cebtenzzre commented Oct 29, 2023

cebtenzzre commented Oct 29, 2023

Galunid commented Oct 31, 2023

Galunid commented Oct 31, 2023

maddes8cht commented Nov 6, 2023

Galunid commented Nov 7, 2023

Galunid commented Nov 7, 2023 •

edited

Loading

maddes8cht commented Nov 7, 2023

Galunid commented Nov 7, 2023

Galunid commented Nov 8, 2023

maddes8cht commented Nov 8, 2023

Green-Sky commented Nov 8, 2023

maddes8cht commented Nov 22, 2023 •

edited

Loading

Generalize convert scripts #3838

Generalize convert scripts #3838

Conversation

Galunid commented Oct 28, 2023 • edited Loading

Green-Sky commented Oct 28, 2023

TheBloke commented Oct 28, 2023

Galunid commented Oct 28, 2023

cebtenzzre commented Oct 29, 2023 • edited Loading

Galunid commented Oct 29, 2023

Galunid commented Oct 29, 2023

Notes to self

cebtenzzre commented Oct 29, 2023

cebtenzzre commented Oct 29, 2023

Galunid commented Oct 31, 2023

Galunid commented Oct 31, 2023

maddes8cht commented Nov 6, 2023

Galunid commented Nov 7, 2023

Galunid commented Nov 7, 2023 • edited Loading

maddes8cht commented Nov 7, 2023

Galunid commented Nov 7, 2023

Galunid commented Nov 8, 2023

maddes8cht commented Nov 8, 2023

Green-Sky commented Nov 8, 2023

maddes8cht commented Nov 22, 2023 • edited Loading

Galunid commented Oct 28, 2023 •

edited

Loading

cebtenzzre commented Oct 29, 2023 •

edited

Loading

Galunid commented Nov 7, 2023 •

edited

Loading

maddes8cht commented Nov 22, 2023 •

edited

Loading