Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShareGPT4V-7B-- a multimodel model that surpasses Llava #4196

Closed
itsPreto opened this issue Nov 24, 2023 · 19 comments
Closed

ShareGPT4V-7B-- a multimodel model that surpasses Llava #4196

itsPreto opened this issue Nov 24, 2023 · 19 comments
Labels
enhancement New feature or request model Model specific

Comments

@itsPreto
Copy link

are there plans to support this new SOTA open source vision model?

--despite its compact size, the model is able to extract text from images with incredible accuracy.

@BarfingLemurs
Copy link
Contributor

ShareGPT4V: #4172

@cmp-nct
Copy link
Contributor

cmp-nct commented Nov 24, 2023

I'm using it since a while, it's great
You can just copy the new conversion script from my linked PR and use that instead of the old one, example usage is in the PR.

@KerfuffleV2 KerfuffleV2 added enhancement New feature or request model Model specific and removed bug-unconfirmed labels Nov 24, 2023
@itsPreto
Copy link
Author

itsPreto commented Nov 24, 2023

Thats great news, indeed! So I downloaded the files required, threw them in a folder named ShareGPT4V-7B_Pretrained_vit-large336-l12 but not sure where this llava.projector comes from. Do you mind elaborating a bit?

@cmp-nct

@cmp-nct
Copy link
Contributor

cmp-nct commented Nov 24, 2023

@itsPreto Don't smash all of it together, each folder has an own config.json.
You need the models and the vision encodes, both from ShareGPT4 in two folders.
So the model and the vision tower (https://huggingface.co/Lin-Chen/ShareGPT4V-7B_Pretrained_vit-large336-l12)
Then overwrite the conversion script in examples/llava/ with my PR.

Everything else is as described here:
https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/README.md
With the exception that you add --clip_model_is_vision to the image encoder python script

@itsPreto
Copy link
Author

main:    total time =  7891.93 ms
❯ ./llava-cli -m models/shareGPT4V-7B/shareGPT4V-7B-q4_0.gguf --mmproj models/shareGPT4V-7B/mmproj-model-f16.gguf --image ../ippo.png
gguf_init_from_file: GGUFv1 is no longer supported. please use a more up-to-date version
libc++abi: terminating due to uncaught exception of type std::runtime_error: clip_model_load: failed to load CLIP model from models/shareGPT4V-7B/mmproj-model-f16.gguf. Does this file exist?

[1]    12232 abort      ./llava-cli -m models/shareGPT4V-7B/shareGPT4V-7B-q4_0.gguf --mmproj  --image

@cmp-nct Ty! I was able to convert/split/quantize the model but I'm not able to actually run it due to an old gguf format? Not sure why that would be since I have the latest from master branch and replaced the simply replace the relevant parts of convert-image-encoder-to-gguf.py with your changes.

Any ideas?

Here's the directory for the model:

models/shareGPT4V-7B
├── config.json
├── generation_config.json
├── ggml-model-f16.gguf
├── llava.projector
├── mmproj-model-f16.gguf
├── pytorch_model-00001-of-00002.bin
├── pytorch_model-00002-of-00002.bin
├── pytorch_model.bin.index.json
├── shareGPT4V-7B-q4_0.gguf
├── special_tokens_map.json
├── tokenizer.model
├── tokenizer_config.json
└── tower
    ├── config.json
    ├── preprocessor_config.json
    └── pytorch_model.bin

@cmp-nct
Copy link
Contributor

cmp-nct commented Nov 24, 2023

It's not related to the image converter.
You sure your bin/Release/quantize is up to date and not very old ?
No matter what gguf version was produced with python, the quantize would update it to latest.
Can you verify the filesize of the gguf too ? If it's plausible size.

Also, I'd use q4_k not q4_0, if you are on a recent release you can use K quants on llava (about 40 tensors will fallback to compatibility quants)

For your reference:
image

@itsPreto
Copy link
Author

itsPreto commented Nov 24, 2023

@cmp-nct

./models/shareGPT4V-7B/shareGPT4V-7B-q4_k.gguf (3.80GB).

Just to make sure I'm not crazy I cloned the project fresh, ran through the instructions again and quantized to q4_k instead as you suggested-- I'm still getting the same error? Is there anything in particular that I should be doing other than running make then make llava-cli?

I've been quantizing my LLMs up until now just fine (text models), have yet to run into this issue with quantize.

@cmp-nct
Copy link
Contributor

cmp-nct commented Nov 24, 2023

.\build\bin\Release\llava-cli.exe -m Q:\models\llava\ShareGPT4V-7B\ggml-model-q6_k --mmproj Q:\models\llava\ShareGPT4V-7B\mmproj-model-f16.gguf  -ngl 80 -b 1024 --image c:\temp\tmp.png
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6
clip_model_load: model name:   ShareGPT4V-7B_Pretrained_vit-large336-l12
clip_model_load: description:  image encoder for LLaVA
clip_model_load: GGUF version: 2
clip_model_load: alignment:    32
clip_model_load: n_tensors:    377
clip_model_load: n_kv:         18
clip_model_load: ftype:        f16

That is how it should look like, it looks like something is wrong with the projector, make sure your files are all complete.
I'd restart fresh over, removing all new files.
llava-surgery.py -> convert-image-encoder-to-gguf.py -> convert.py -> quantize

py convert-image-encoder-to-gguf.py -m ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector ShareGPT4V-7B/llava.projector --output-dir ShareGPT4V-7B --clip_model_is_vision

I've no idea where your problem originates from, so far best I can recommend is to ensure your python files are complete and then separate them into two directories (just as they come from HF git), so you don't deviate from the usual process and possibly hit something untested.

@itsPreto
Copy link
Author

Okay I'm strictly following the steps in the example:

❯ python ./examples/llava/llava-surgery.py -m models/shareGPT4V-7B
Done!
Now you can convert models/shareGPT4V-7B to a a regular LLaMA GGUF file.
Also, use models/shareGPT4V-7B/llava.projector to prepare a llava-encoder.gguf file.
❯ cd models/shareGPT4V-7B
❯ ls
README.md                        pytorch_model-00002-of-00002.bin
config.json                      pytorch_model.bin.index.json
generation_config.json           special_tokens_map.json
llava.projector                  tokenizer.model
pytorch_model-00001-of-00002.bin tokenizer_config.json

llava.projector successfully generated so I'll run convert-image-encoder-to-gguf.py:

❯ python ./examples/llava/convert-image-encoder-to-gguf.py -m  models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector  models/shareGPT4V-7B/llava.projector 
--output-dir models/shareGPT4V-7B --clip_model_is_vision


gguf: This GGUF file is for Little Endian only
Projector tensors added

Converting to float32
v.class_embd - f32 - shape = (1024,)
tensor v.patch_embd.weight is always saved in f16
v.patch_embd.weight - f16 - shape = (1024, 3, 14, 14)
Converting to float16
v.position_embd.weight - f16 - shape = (577, 1024)
.....
.....
.....
v.blk.22.ln2.bias - f32 - shape = (1024,)
skipping parameter: vision_model.post_layernorm.weight
Done. Output file: models/shareGPT4V-7B/mmproj-model-f16.gguf

double checking files in shareGPT4V-7B:

❯ cd models/shareGPT4V-7B
❯ ls
README.md                        pytorch_model-00002-of-00002.bin
config.json                      pytorch_model.bin.index.json
generation_config.json           special_tokens_map.json
llava.projector                  tokenizer.model
mmproj-model-f16.gguf            tokenizer_config.json
pytorch_model-00001-of-00002.bin

running convert.py on the pytorch .bin model parts:

❯ python convert.py models/shareGPT4V-7B
Loading model file models/shareGPT4V-7B/pytorch_model-00001-of-00002.bin
Loading model file models/shareGPT4V-7B/pytorch_model-00001-of-00002.bin
Loading model file models/shareGPT4V-7B/pytorch_model-00002-of-00002.bin
params = Params(n_vocab=32000, n_embd=4096, n_layer=32, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('models/shareGPT4V-7B'))
Loading vocab file 'models/shareGPT4V-7B/tokenizer.model', type 'spm'
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5

.....
.....
.....

model.layers.31.post_attention_layernorm.weight  -> blk.31.ffn_norm.weight                   | BF16   | [4096]
model.norm.weight                                -> output_norm.weight                       | BF16   | [4096]
lm_head.weight                                   -> output.weight                            | BF16   | [32000, 4096]
Writing models/shareGPT4V-7B/ggml-model-f16.gguf, format 1
gguf: This GGUF file is for Little Endian only
gguf: Setting special token type bos to 1
gguf: Setting special token type eos to 2
gguf: Setting special token type pad to 0
gguf: Setting add_bos_token to True
gguf: Setting add_eos_token to False
[  1/291] Writing tensor token_embd.weight                      | size  32000 x   4096  | type F16  | T+   0
[  2/291] Writing tensor blk.0.attn_q.weight                    | size   4096 x   4096  | type F16  | T+   0
[  3/291] Writing tensor blk.0.attn_k.weight                    | size   4096 x   4096  | type F16  | T+   0

  4096  | type F16  | T+   8
Wrote models/shareGPT4V-7B/ggml-model-f16.gguf

this generated the gguf variant in f16 (not sure if this was supposed to be f32):

❯ cd models/shareGPT4V-7B
❯ ls
README.md                        pytorch_model-00001-of-00002.bin
config.json                      pytorch_model-00002-of-00002.bin
generation_config.json           pytorch_model.bin.index.json
ggml-model-f16.gguf              special_tokens_map.json
llava.projector                  tokenizer.model
mmproj-model-f16.gguf            tokenizer_config.json

all the files seem to be there, lets try quantizing the ggml-model-f16.gguf again:

❯ ./quantize models/shareGPT4V-7B/ggml-model-f16.gguf models/shareGPT4V-7B/shareGPT4V-7B-q4_K_M.gguf Q4_K_M
main: build = 1560 (e9c13ff)
main: built with Apple clang version 15.0.0 (clang-1500.0.40.1) for arm64-apple-darwin23.1.0
main: quantizing 'models/shareGPT4V-7B/ggml-model-f16.gguf' to 'models/shareGPT4V-7B/shareGPT4V-7B-q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from models/shareGPT4V-7B/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight f16      [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight f16      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight f16      [  4096,  4096,     1,     1 ]

notice this: version GGUF V3 (latest)

llama_model_loader: - tensor  290:                    output.weight f16      [  4096, 32000,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
llama_model_quantize_internal: meta size = 741152 bytes
[   1/ 291]

.....
.....
.....

main: quantize time = 59258.60 ms
main:    total time = 59258.60 ms

checking files once again:

❯ cd models/shareGPT4V-7B
❯ ls
README.md                        ggml-model-f16.gguf              pytorch_model-00001-of-00002.bin shareGPT4V-7B-q4_K_M.gguf             tokenizer_config.json
config.json                      llava.projector                  pytorch_model-00002-of-00002.bin special_tokens_map.json
generation_config.json           mmproj-model-f16.gguf            pytorch_model.bin.index.json     tokenizer.model

lets try running it now:


> make llava-cli

make: `llava-cli' is up to date.


❯ ./llava-cli -m ./models/shareGPT4V-7B/shareGPT4V-7B-q4_K_M.gguf --mmproj ./models/shareGPT4V-7B/mmproj-model-f16.gguf --image ../ippo.png
clip_model_load: model name:   ShareGPT4V-7B_Pretrained_vit-large336-l12
clip_model_load: description:  image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment:    32
clip_model_load: n_tensors:    377
clip_model_load: n_kv:         18
clip_model_load: ftype:        f16

clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector:  1
clip_model_load: model size:     595.62 MB
clip_model_load: metadata size:  0.14 MB
clip_model_load: total allocated memory: 195.95 MB
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from ./models/shareGPT4V-7B/shareGPT4V-7B-q4_K_M.gguf (version GGUF V3 (latest))

.....
.....
.....


llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q4_K - Medium
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.80 GiB (4.84 BPW)
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: mem required  = 3891.35 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  = 1024.00 MiB
llama_build_graph: non-view tensors processed: 740/740
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: loading '/Users/marconeves/Library/CloudStorage/OneDrive-DeltaAirLines/Desktop/desktop/projects/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 10922.67 MiB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 159.07 MiB
llama_new_context_with_model: max tensor size =   102.54 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3891.95 MiB, ( 3892.58 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  1024.02 MiB, ( 4916.59 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   156.02 MiB, ( 5072.61 / 10922.67)

encode_image_with_clip: image encoded in  1258.81 ms by CLIP (    2.19 ms per image patch)

 The image portrays a dynamic scene from a comic book. The central figure is a young man with dark hair, dressed in a t-shirt and pants. He is captured mid-action, his body leaning forward with a determined expression on his face, suggesting he is engaged in a high-stakes battle.

In his right hand, he wields a boxing glove, indicating that he is a boxer. His left hand is raised, possibly in the midst of a powerful punch. The background of the image is ablaze, with bright flashes of light illuminating the scene, adding to the intensity of the moment.

The entire image is rendered in black and white, with the exception of the boxer's glove and the bright background, which are colored. The use of grayscale in the image gives it a dramatic and intense feel, enhancing the visual impact of the scene.

Please note that this description is based on the visible elements in the image and does not include any inferred or imagined details.

llama_print_timings:        load time =    6712.99 ms
llama_print_timings:      sample time =       5.57 ms /   230 runs   (    0.02 ms per token, 41314.89 tokens per second)
llama_print_timings: prompt eval time =    7968.92 ms /   617 tokens (   12.92 ms per token,    77.43 tokens per second)
llama_print_timings:        eval time =    8815.57 ms /   230 runs   (   38.33 ms per token,    26.09 tokens per second)
llama_print_timings:       total time =   18693.08 ms
ggml_metal_free: deallocating

Wow! I have no clue why it worked this time around since I did virtually nothing different... mildly infuriating but it finally works and works GREAT! Thanks! @cmp-nct

@winer632
Copy link

winer632 commented Nov 28, 2023

I followed the steps @itsPreto listed in the above comment. However it popped up an error message in the last step.
How did I get this error? Could anyone help? @cmp-nct @itsPreto

image

image

image

@winer632
Copy link

winer632 commented Nov 28, 2023

I changed verbosity of this line to 3 in llava-cli.cpp, remake llava-cli and got the detailed info.

image

(base) [root@k8s-0 llama.cpp]# ./llava-cli -m ./models/ShareGPT4V-7B/ShareGPT4V-7B-q4_K_M.gguf --mmproj ./models/ShareGPT4V-7B/mmproj-model-f16.gguf --image ../test_photo/zhuanma.jpeg
clip_model_load: model name:   ShareGPT4V-7B_Pretrained_vit-large336-l12
clip_model_load: description:  image encoder for LLaVA
clip_model_load: GGUF version: 3
clip_model_load: alignment:    32
clip_model_load: n_tensors:    373
clip_model_load: n_kv:         18
clip_model_load: ftype:        f16

clip_model_load: kv[0]: key = general.architecture
clip_model_load: kv[1]: key = clip.has_text_encoder
clip_model_load: kv[2]: key = clip.has_vision_encoder
clip_model_load: kv[3]: key = clip.has_llava_projector
clip_model_load: kv[4]: key = general.file_type
clip_model_load: kv[5]: key = general.name
clip_model_load: kv[6]: key = general.description
clip_model_load: kv[7]: key = clip.vision.image_size
clip_model_load: kv[8]: key = clip.vision.patch_size
clip_model_load: kv[9]: key = clip.vision.embedding_length
clip_model_load: kv[10]: key = clip.vision.feed_forward_length
clip_model_load: kv[11]: key = clip.vision.projection_dim
clip_model_load: kv[12]: key = clip.vision.attention.head_count
clip_model_load: kv[13]: key = clip.vision.attention.layer_norm_epsilon
clip_model_load: kv[14]: key = clip.vision.block_count
clip_model_load: kv[15]: key = clip.vision.image_mean
clip_model_load: kv[16]: key = clip.vision.image_std
clip_model_load: kv[17]: key = clip.use_gelu

clip_model_load: tensor[0]: n_dims = 1, name = v.class_embd, tensor_size=4096, padded_size=4096, offset=0
clip_model_load: tensor[1]: n_dims = 4, name = v.patch_embd.weight, tensor_size=1204224, padded_size=1204224, offset=4096
clip_model_load: tensor[2]: n_dims = 2, name = v.position_embd.weight, tensor_size=1181696, padded_size=1181696, offset=1208320
clip_model_load: tensor[3]: n_dims = 1, name = v.pre_ln.weight, tensor_size=4096, padded_size=4096, offset=2390016
clip_model_load: tensor[4]: n_dims = 1, name = v.pre_ln.bias, tensor_size=4096, padded_size=4096, offset=2394112
clip_model_load: tensor[5]: n_dims = 2, name = v.blk.0.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=2398208
clip_model_load: tensor[6]: n_dims = 1, name = v.blk.0.attn_k.bias, tensor_size=4096, padded_size=4096, offset=4495360
clip_model_load: tensor[7]: n_dims = 2, name = v.blk.0.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=4499456
clip_model_load: tensor[8]: n_dims = 1, name = v.blk.0.attn_v.bias, tensor_size=4096, padded_size=4096, offset=6596608
clip_model_load: tensor[9]: n_dims = 2, name = v.blk.0.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=6600704
clip_model_load: tensor[10]: n_dims = 1, name = v.blk.0.attn_q.bias, tensor_size=4096, padded_size=4096, offset=8697856
clip_model_load: tensor[11]: n_dims = 2, name = v.blk.0.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=8701952
clip_model_load: tensor[12]: n_dims = 1, name = v.blk.0.attn_out.bias, tensor_size=4096, padded_size=4096, offset=10799104
clip_model_load: tensor[13]: n_dims = 1, name = v.blk.0.ln1.weight, tensor_size=4096, padded_size=4096, offset=10803200
clip_model_load: tensor[14]: n_dims = 1, name = v.blk.0.ln1.bias, tensor_size=4096, padded_size=4096, offset=10807296
clip_model_load: tensor[15]: n_dims = 2, name = v.blk.0.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=10811392
clip_model_load: tensor[16]: n_dims = 1, name = v.blk.0.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=19200000
clip_model_load: tensor[17]: n_dims = 2, name = v.blk.0.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=19216384
clip_model_load: tensor[18]: n_dims = 1, name = v.blk.0.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=27604992
clip_model_load: tensor[19]: n_dims = 1, name = v.blk.0.ln2.weight, tensor_size=4096, padded_size=4096, offset=27609088
clip_model_load: tensor[20]: n_dims = 1, name = v.blk.0.ln2.bias, tensor_size=4096, padded_size=4096, offset=27613184
clip_model_load: tensor[21]: n_dims = 2, name = v.blk.1.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=27617280
clip_model_load: tensor[22]: n_dims = 1, name = v.blk.1.attn_k.bias, tensor_size=4096, padded_size=4096, offset=29714432
clip_model_load: tensor[23]: n_dims = 2, name = v.blk.1.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=29718528
clip_model_load: tensor[24]: n_dims = 1, name = v.blk.1.attn_v.bias, tensor_size=4096, padded_size=4096, offset=31815680
clip_model_load: tensor[25]: n_dims = 2, name = v.blk.1.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=31819776
clip_model_load: tensor[26]: n_dims = 1, name = v.blk.1.attn_q.bias, tensor_size=4096, padded_size=4096, offset=33916928
clip_model_load: tensor[27]: n_dims = 2, name = v.blk.1.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=33921024
clip_model_load: tensor[28]: n_dims = 1, name = v.blk.1.attn_out.bias, tensor_size=4096, padded_size=4096, offset=36018176
clip_model_load: tensor[29]: n_dims = 1, name = v.blk.1.ln1.weight, tensor_size=4096, padded_size=4096, offset=36022272
clip_model_load: tensor[30]: n_dims = 1, name = v.blk.1.ln1.bias, tensor_size=4096, padded_size=4096, offset=36026368
clip_model_load: tensor[31]: n_dims = 2, name = v.blk.1.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=36030464
clip_model_load: tensor[32]: n_dims = 1, name = v.blk.1.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=44419072
clip_model_load: tensor[33]: n_dims = 2, name = v.blk.1.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=44435456
clip_model_load: tensor[34]: n_dims = 1, name = v.blk.1.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=52824064
clip_model_load: tensor[35]: n_dims = 1, name = v.blk.1.ln2.weight, tensor_size=4096, padded_size=4096, offset=52828160
clip_model_load: tensor[36]: n_dims = 1, name = v.blk.1.ln2.bias, tensor_size=4096, padded_size=4096, offset=52832256
clip_model_load: tensor[37]: n_dims = 2, name = v.blk.2.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=52836352
clip_model_load: tensor[38]: n_dims = 1, name = v.blk.2.attn_k.bias, tensor_size=4096, padded_size=4096, offset=54933504
clip_model_load: tensor[39]: n_dims = 2, name = v.blk.2.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=54937600
clip_model_load: tensor[40]: n_dims = 1, name = v.blk.2.attn_v.bias, tensor_size=4096, padded_size=4096, offset=57034752
clip_model_load: tensor[41]: n_dims = 2, name = v.blk.2.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=57038848
clip_model_load: tensor[42]: n_dims = 1, name = v.blk.2.attn_q.bias, tensor_size=4096, padded_size=4096, offset=59136000
clip_model_load: tensor[43]: n_dims = 2, name = v.blk.2.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=59140096
clip_model_load: tensor[44]: n_dims = 1, name = v.blk.2.attn_out.bias, tensor_size=4096, padded_size=4096, offset=61237248
clip_model_load: tensor[45]: n_dims = 1, name = v.blk.2.ln1.weight, tensor_size=4096, padded_size=4096, offset=61241344
clip_model_load: tensor[46]: n_dims = 1, name = v.blk.2.ln1.bias, tensor_size=4096, padded_size=4096, offset=61245440
clip_model_load: tensor[47]: n_dims = 2, name = v.blk.2.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=61249536
clip_model_load: tensor[48]: n_dims = 1, name = v.blk.2.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=69638144
clip_model_load: tensor[49]: n_dims = 2, name = v.blk.2.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=69654528
clip_model_load: tensor[50]: n_dims = 1, name = v.blk.2.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=78043136
clip_model_load: tensor[51]: n_dims = 1, name = v.blk.2.ln2.weight, tensor_size=4096, padded_size=4096, offset=78047232
clip_model_load: tensor[52]: n_dims = 1, name = v.blk.2.ln2.bias, tensor_size=4096, padded_size=4096, offset=78051328
clip_model_load: tensor[53]: n_dims = 2, name = v.blk.3.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=78055424
clip_model_load: tensor[54]: n_dims = 1, name = v.blk.3.attn_k.bias, tensor_size=4096, padded_size=4096, offset=80152576
clip_model_load: tensor[55]: n_dims = 2, name = v.blk.3.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=80156672
clip_model_load: tensor[56]: n_dims = 1, name = v.blk.3.attn_v.bias, tensor_size=4096, padded_size=4096, offset=82253824
clip_model_load: tensor[57]: n_dims = 2, name = v.blk.3.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=82257920
clip_model_load: tensor[58]: n_dims = 1, name = v.blk.3.attn_q.bias, tensor_size=4096, padded_size=4096, offset=84355072
clip_model_load: tensor[59]: n_dims = 2, name = v.blk.3.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=84359168
clip_model_load: tensor[60]: n_dims = 1, name = v.blk.3.attn_out.bias, tensor_size=4096, padded_size=4096, offset=86456320
clip_model_load: tensor[61]: n_dims = 1, name = v.blk.3.ln1.weight, tensor_size=4096, padded_size=4096, offset=86460416
clip_model_load: tensor[62]: n_dims = 1, name = v.blk.3.ln1.bias, tensor_size=4096, padded_size=4096, offset=86464512
clip_model_load: tensor[63]: n_dims = 2, name = v.blk.3.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=86468608
clip_model_load: tensor[64]: n_dims = 1, name = v.blk.3.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=94857216
clip_model_load: tensor[65]: n_dims = 2, name = v.blk.3.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=94873600
clip_model_load: tensor[66]: n_dims = 1, name = v.blk.3.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=103262208
clip_model_load: tensor[67]: n_dims = 1, name = v.blk.3.ln2.weight, tensor_size=4096, padded_size=4096, offset=103266304
clip_model_load: tensor[68]: n_dims = 1, name = v.blk.3.ln2.bias, tensor_size=4096, padded_size=4096, offset=103270400
clip_model_load: tensor[69]: n_dims = 2, name = v.blk.4.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=103274496
clip_model_load: tensor[70]: n_dims = 1, name = v.blk.4.attn_k.bias, tensor_size=4096, padded_size=4096, offset=105371648
clip_model_load: tensor[71]: n_dims = 2, name = v.blk.4.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=105375744
clip_model_load: tensor[72]: n_dims = 1, name = v.blk.4.attn_v.bias, tensor_size=4096, padded_size=4096, offset=107472896
clip_model_load: tensor[73]: n_dims = 2, name = v.blk.4.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=107476992
clip_model_load: tensor[74]: n_dims = 1, name = v.blk.4.attn_q.bias, tensor_size=4096, padded_size=4096, offset=109574144
clip_model_load: tensor[75]: n_dims = 2, name = v.blk.4.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=109578240
clip_model_load: tensor[76]: n_dims = 1, name = v.blk.4.attn_out.bias, tensor_size=4096, padded_size=4096, offset=111675392
clip_model_load: tensor[77]: n_dims = 1, name = v.blk.4.ln1.weight, tensor_size=4096, padded_size=4096, offset=111679488
clip_model_load: tensor[78]: n_dims = 1, name = v.blk.4.ln1.bias, tensor_size=4096, padded_size=4096, offset=111683584
clip_model_load: tensor[79]: n_dims = 2, name = v.blk.4.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=111687680
clip_model_load: tensor[80]: n_dims = 1, name = v.blk.4.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=120076288
clip_model_load: tensor[81]: n_dims = 2, name = v.blk.4.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=120092672
clip_model_load: tensor[82]: n_dims = 1, name = v.blk.4.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=128481280
clip_model_load: tensor[83]: n_dims = 1, name = v.blk.4.ln2.weight, tensor_size=4096, padded_size=4096, offset=128485376
clip_model_load: tensor[84]: n_dims = 1, name = v.blk.4.ln2.bias, tensor_size=4096, padded_size=4096, offset=128489472
clip_model_load: tensor[85]: n_dims = 2, name = v.blk.5.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=128493568
clip_model_load: tensor[86]: n_dims = 1, name = v.blk.5.attn_k.bias, tensor_size=4096, padded_size=4096, offset=130590720
clip_model_load: tensor[87]: n_dims = 2, name = v.blk.5.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=130594816
clip_model_load: tensor[88]: n_dims = 1, name = v.blk.5.attn_v.bias, tensor_size=4096, padded_size=4096, offset=132691968
clip_model_load: tensor[89]: n_dims = 2, name = v.blk.5.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=132696064
clip_model_load: tensor[90]: n_dims = 1, name = v.blk.5.attn_q.bias, tensor_size=4096, padded_size=4096, offset=134793216
clip_model_load: tensor[91]: n_dims = 2, name = v.blk.5.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=134797312
clip_model_load: tensor[92]: n_dims = 1, name = v.blk.5.attn_out.bias, tensor_size=4096, padded_size=4096, offset=136894464
clip_model_load: tensor[93]: n_dims = 1, name = v.blk.5.ln1.weight, tensor_size=4096, padded_size=4096, offset=136898560
clip_model_load: tensor[94]: n_dims = 1, name = v.blk.5.ln1.bias, tensor_size=4096, padded_size=4096, offset=136902656
clip_model_load: tensor[95]: n_dims = 2, name = v.blk.5.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=136906752
clip_model_load: tensor[96]: n_dims = 1, name = v.blk.5.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=145295360
clip_model_load: tensor[97]: n_dims = 2, name = v.blk.5.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=145311744
clip_model_load: tensor[98]: n_dims = 1, name = v.blk.5.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=153700352
clip_model_load: tensor[99]: n_dims = 1, name = v.blk.5.ln2.weight, tensor_size=4096, padded_size=4096, offset=153704448
clip_model_load: tensor[100]: n_dims = 1, name = v.blk.5.ln2.bias, tensor_size=4096, padded_size=4096, offset=153708544
clip_model_load: tensor[101]: n_dims = 2, name = v.blk.6.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=153712640
clip_model_load: tensor[102]: n_dims = 1, name = v.blk.6.attn_k.bias, tensor_size=4096, padded_size=4096, offset=155809792
clip_model_load: tensor[103]: n_dims = 2, name = v.blk.6.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=155813888
clip_model_load: tensor[104]: n_dims = 1, name = v.blk.6.attn_v.bias, tensor_size=4096, padded_size=4096, offset=157911040
clip_model_load: tensor[105]: n_dims = 2, name = v.blk.6.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=157915136
clip_model_load: tensor[106]: n_dims = 1, name = v.blk.6.attn_q.bias, tensor_size=4096, padded_size=4096, offset=160012288
clip_model_load: tensor[107]: n_dims = 2, name = v.blk.6.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=160016384
clip_model_load: tensor[108]: n_dims = 1, name = v.blk.6.attn_out.bias, tensor_size=4096, padded_size=4096, offset=162113536
clip_model_load: tensor[109]: n_dims = 1, name = v.blk.6.ln1.weight, tensor_size=4096, padded_size=4096, offset=162117632
clip_model_load: tensor[110]: n_dims = 1, name = v.blk.6.ln1.bias, tensor_size=4096, padded_size=4096, offset=162121728
clip_model_load: tensor[111]: n_dims = 2, name = v.blk.6.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=162125824
clip_model_load: tensor[112]: n_dims = 1, name = v.blk.6.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=170514432
clip_model_load: tensor[113]: n_dims = 2, name = v.blk.6.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=170530816
clip_model_load: tensor[114]: n_dims = 1, name = v.blk.6.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=178919424
clip_model_load: tensor[115]: n_dims = 1, name = v.blk.6.ln2.weight, tensor_size=4096, padded_size=4096, offset=178923520
clip_model_load: tensor[116]: n_dims = 1, name = v.blk.6.ln2.bias, tensor_size=4096, padded_size=4096, offset=178927616
clip_model_load: tensor[117]: n_dims = 2, name = v.blk.7.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=178931712
clip_model_load: tensor[118]: n_dims = 1, name = v.blk.7.attn_k.bias, tensor_size=4096, padded_size=4096, offset=181028864
clip_model_load: tensor[119]: n_dims = 2, name = v.blk.7.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=181032960
clip_model_load: tensor[120]: n_dims = 1, name = v.blk.7.attn_v.bias, tensor_size=4096, padded_size=4096, offset=183130112
clip_model_load: tensor[121]: n_dims = 2, name = v.blk.7.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=183134208
clip_model_load: tensor[122]: n_dims = 1, name = v.blk.7.attn_q.bias, tensor_size=4096, padded_size=4096, offset=185231360
clip_model_load: tensor[123]: n_dims = 2, name = v.blk.7.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=185235456
clip_model_load: tensor[124]: n_dims = 1, name = v.blk.7.attn_out.bias, tensor_size=4096, padded_size=4096, offset=187332608
clip_model_load: tensor[125]: n_dims = 1, name = v.blk.7.ln1.weight, tensor_size=4096, padded_size=4096, offset=187336704
clip_model_load: tensor[126]: n_dims = 1, name = v.blk.7.ln1.bias, tensor_size=4096, padded_size=4096, offset=187340800
clip_model_load: tensor[127]: n_dims = 2, name = v.blk.7.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=187344896
clip_model_load: tensor[128]: n_dims = 1, name = v.blk.7.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=195733504
clip_model_load: tensor[129]: n_dims = 2, name = v.blk.7.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=195749888
clip_model_load: tensor[130]: n_dims = 1, name = v.blk.7.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=204138496
clip_model_load: tensor[131]: n_dims = 1, name = v.blk.7.ln2.weight, tensor_size=4096, padded_size=4096, offset=204142592
clip_model_load: tensor[132]: n_dims = 1, name = v.blk.7.ln2.bias, tensor_size=4096, padded_size=4096, offset=204146688
clip_model_load: tensor[133]: n_dims = 2, name = v.blk.8.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=204150784
clip_model_load: tensor[134]: n_dims = 1, name = v.blk.8.attn_k.bias, tensor_size=4096, padded_size=4096, offset=206247936
clip_model_load: tensor[135]: n_dims = 2, name = v.blk.8.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=206252032
clip_model_load: tensor[136]: n_dims = 1, name = v.blk.8.attn_v.bias, tensor_size=4096, padded_size=4096, offset=208349184
clip_model_load: tensor[137]: n_dims = 2, name = v.blk.8.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=208353280
clip_model_load: tensor[138]: n_dims = 1, name = v.blk.8.attn_q.bias, tensor_size=4096, padded_size=4096, offset=210450432
clip_model_load: tensor[139]: n_dims = 2, name = v.blk.8.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=210454528
clip_model_load: tensor[140]: n_dims = 1, name = v.blk.8.attn_out.bias, tensor_size=4096, padded_size=4096, offset=212551680
clip_model_load: tensor[141]: n_dims = 1, name = v.blk.8.ln1.weight, tensor_size=4096, padded_size=4096, offset=212555776
clip_model_load: tensor[142]: n_dims = 1, name = v.blk.8.ln1.bias, tensor_size=4096, padded_size=4096, offset=212559872
clip_model_load: tensor[143]: n_dims = 2, name = v.blk.8.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=212563968
clip_model_load: tensor[144]: n_dims = 1, name = v.blk.8.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=220952576
clip_model_load: tensor[145]: n_dims = 2, name = v.blk.8.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=220968960
clip_model_load: tensor[146]: n_dims = 1, name = v.blk.8.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=229357568
clip_model_load: tensor[147]: n_dims = 1, name = v.blk.8.ln2.weight, tensor_size=4096, padded_size=4096, offset=229361664
clip_model_load: tensor[148]: n_dims = 1, name = v.blk.8.ln2.bias, tensor_size=4096, padded_size=4096, offset=229365760
clip_model_load: tensor[149]: n_dims = 2, name = v.blk.9.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=229369856
clip_model_load: tensor[150]: n_dims = 1, name = v.blk.9.attn_k.bias, tensor_size=4096, padded_size=4096, offset=231467008
clip_model_load: tensor[151]: n_dims = 2, name = v.blk.9.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=231471104
clip_model_load: tensor[152]: n_dims = 1, name = v.blk.9.attn_v.bias, tensor_size=4096, padded_size=4096, offset=233568256
clip_model_load: tensor[153]: n_dims = 2, name = v.blk.9.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=233572352
clip_model_load: tensor[154]: n_dims = 1, name = v.blk.9.attn_q.bias, tensor_size=4096, padded_size=4096, offset=235669504
clip_model_load: tensor[155]: n_dims = 2, name = v.blk.9.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=235673600
clip_model_load: tensor[156]: n_dims = 1, name = v.blk.9.attn_out.bias, tensor_size=4096, padded_size=4096, offset=237770752
clip_model_load: tensor[157]: n_dims = 1, name = v.blk.9.ln1.weight, tensor_size=4096, padded_size=4096, offset=237774848
clip_model_load: tensor[158]: n_dims = 1, name = v.blk.9.ln1.bias, tensor_size=4096, padded_size=4096, offset=237778944
clip_model_load: tensor[159]: n_dims = 2, name = v.blk.9.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=237783040
clip_model_load: tensor[160]: n_dims = 1, name = v.blk.9.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=246171648
clip_model_load: tensor[161]: n_dims = 2, name = v.blk.9.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=246188032
clip_model_load: tensor[162]: n_dims = 1, name = v.blk.9.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=254576640
clip_model_load: tensor[163]: n_dims = 1, name = v.blk.9.ln2.weight, tensor_size=4096, padded_size=4096, offset=254580736
clip_model_load: tensor[164]: n_dims = 1, name = v.blk.9.ln2.bias, tensor_size=4096, padded_size=4096, offset=254584832
clip_model_load: tensor[165]: n_dims = 2, name = v.blk.10.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=254588928
clip_model_load: tensor[166]: n_dims = 1, name = v.blk.10.attn_k.bias, tensor_size=4096, padded_size=4096, offset=256686080
clip_model_load: tensor[167]: n_dims = 2, name = v.blk.10.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=256690176
clip_model_load: tensor[168]: n_dims = 1, name = v.blk.10.attn_v.bias, tensor_size=4096, padded_size=4096, offset=258787328
clip_model_load: tensor[169]: n_dims = 2, name = v.blk.10.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=258791424
clip_model_load: tensor[170]: n_dims = 1, name = v.blk.10.attn_q.bias, tensor_size=4096, padded_size=4096, offset=260888576
clip_model_load: tensor[171]: n_dims = 2, name = v.blk.10.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=260892672
clip_model_load: tensor[172]: n_dims = 1, name = v.blk.10.attn_out.bias, tensor_size=4096, padded_size=4096, offset=262989824
clip_model_load: tensor[173]: n_dims = 1, name = v.blk.10.ln1.weight, tensor_size=4096, padded_size=4096, offset=262993920
clip_model_load: tensor[174]: n_dims = 1, name = v.blk.10.ln1.bias, tensor_size=4096, padded_size=4096, offset=262998016
clip_model_load: tensor[175]: n_dims = 2, name = v.blk.10.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=263002112
clip_model_load: tensor[176]: n_dims = 1, name = v.blk.10.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=271390720
clip_model_load: tensor[177]: n_dims = 2, name = v.blk.10.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=271407104
clip_model_load: tensor[178]: n_dims = 1, name = v.blk.10.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=279795712
clip_model_load: tensor[179]: n_dims = 1, name = v.blk.10.ln2.weight, tensor_size=4096, padded_size=4096, offset=279799808
clip_model_load: tensor[180]: n_dims = 1, name = v.blk.10.ln2.bias, tensor_size=4096, padded_size=4096, offset=279803904
clip_model_load: tensor[181]: n_dims = 2, name = v.blk.11.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=279808000
clip_model_load: tensor[182]: n_dims = 1, name = v.blk.11.attn_k.bias, tensor_size=4096, padded_size=4096, offset=281905152
clip_model_load: tensor[183]: n_dims = 2, name = v.blk.11.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=281909248
clip_model_load: tensor[184]: n_dims = 1, name = v.blk.11.attn_v.bias, tensor_size=4096, padded_size=4096, offset=284006400
clip_model_load: tensor[185]: n_dims = 2, name = v.blk.11.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=284010496
clip_model_load: tensor[186]: n_dims = 1, name = v.blk.11.attn_q.bias, tensor_size=4096, padded_size=4096, offset=286107648
clip_model_load: tensor[187]: n_dims = 2, name = v.blk.11.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=286111744
clip_model_load: tensor[188]: n_dims = 1, name = v.blk.11.attn_out.bias, tensor_size=4096, padded_size=4096, offset=288208896
clip_model_load: tensor[189]: n_dims = 1, name = v.blk.11.ln1.weight, tensor_size=4096, padded_size=4096, offset=288212992
clip_model_load: tensor[190]: n_dims = 1, name = v.blk.11.ln1.bias, tensor_size=4096, padded_size=4096, offset=288217088
clip_model_load: tensor[191]: n_dims = 2, name = v.blk.11.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=288221184
clip_model_load: tensor[192]: n_dims = 1, name = v.blk.11.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=296609792
clip_model_load: tensor[193]: n_dims = 2, name = v.blk.11.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=296626176
clip_model_load: tensor[194]: n_dims = 1, name = v.blk.11.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=305014784
clip_model_load: tensor[195]: n_dims = 1, name = v.blk.11.ln2.weight, tensor_size=4096, padded_size=4096, offset=305018880
clip_model_load: tensor[196]: n_dims = 1, name = v.blk.11.ln2.bias, tensor_size=4096, padded_size=4096, offset=305022976
clip_model_load: tensor[197]: n_dims = 2, name = v.blk.12.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=305027072
clip_model_load: tensor[198]: n_dims = 1, name = v.blk.12.attn_k.bias, tensor_size=4096, padded_size=4096, offset=307124224
clip_model_load: tensor[199]: n_dims = 2, name = v.blk.12.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=307128320
clip_model_load: tensor[200]: n_dims = 1, name = v.blk.12.attn_v.bias, tensor_size=4096, padded_size=4096, offset=309225472
clip_model_load: tensor[201]: n_dims = 2, name = v.blk.12.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=309229568
clip_model_load: tensor[202]: n_dims = 1, name = v.blk.12.attn_q.bias, tensor_size=4096, padded_size=4096, offset=311326720
clip_model_load: tensor[203]: n_dims = 2, name = v.blk.12.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=311330816
clip_model_load: tensor[204]: n_dims = 1, name = v.blk.12.attn_out.bias, tensor_size=4096, padded_size=4096, offset=313427968
clip_model_load: tensor[205]: n_dims = 1, name = v.blk.12.ln1.weight, tensor_size=4096, padded_size=4096, offset=313432064
clip_model_load: tensor[206]: n_dims = 1, name = v.blk.12.ln1.bias, tensor_size=4096, padded_size=4096, offset=313436160
clip_model_load: tensor[207]: n_dims = 2, name = v.blk.12.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=313440256
clip_model_load: tensor[208]: n_dims = 1, name = v.blk.12.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=321828864
clip_model_load: tensor[209]: n_dims = 2, name = v.blk.12.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=321845248
clip_model_load: tensor[210]: n_dims = 1, name = v.blk.12.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=330233856
clip_model_load: tensor[211]: n_dims = 1, name = v.blk.12.ln2.weight, tensor_size=4096, padded_size=4096, offset=330237952
clip_model_load: tensor[212]: n_dims = 1, name = v.blk.12.ln2.bias, tensor_size=4096, padded_size=4096, offset=330242048
clip_model_load: tensor[213]: n_dims = 2, name = v.blk.13.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=330246144
clip_model_load: tensor[214]: n_dims = 1, name = v.blk.13.attn_k.bias, tensor_size=4096, padded_size=4096, offset=332343296
clip_model_load: tensor[215]: n_dims = 2, name = v.blk.13.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=332347392
clip_model_load: tensor[216]: n_dims = 1, name = v.blk.13.attn_v.bias, tensor_size=4096, padded_size=4096, offset=334444544
clip_model_load: tensor[217]: n_dims = 2, name = v.blk.13.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=334448640
clip_model_load: tensor[218]: n_dims = 1, name = v.blk.13.attn_q.bias, tensor_size=4096, padded_size=4096, offset=336545792
clip_model_load: tensor[219]: n_dims = 2, name = v.blk.13.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=336549888
clip_model_load: tensor[220]: n_dims = 1, name = v.blk.13.attn_out.bias, tensor_size=4096, padded_size=4096, offset=338647040
clip_model_load: tensor[221]: n_dims = 1, name = v.blk.13.ln1.weight, tensor_size=4096, padded_size=4096, offset=338651136
clip_model_load: tensor[222]: n_dims = 1, name = v.blk.13.ln1.bias, tensor_size=4096, padded_size=4096, offset=338655232
clip_model_load: tensor[223]: n_dims = 2, name = v.blk.13.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=338659328
clip_model_load: tensor[224]: n_dims = 1, name = v.blk.13.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=347047936
clip_model_load: tensor[225]: n_dims = 2, name = v.blk.13.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=347064320
clip_model_load: tensor[226]: n_dims = 1, name = v.blk.13.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=355452928
clip_model_load: tensor[227]: n_dims = 1, name = v.blk.13.ln2.weight, tensor_size=4096, padded_size=4096, offset=355457024
clip_model_load: tensor[228]: n_dims = 1, name = v.blk.13.ln2.bias, tensor_size=4096, padded_size=4096, offset=355461120
clip_model_load: tensor[229]: n_dims = 2, name = v.blk.14.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=355465216
clip_model_load: tensor[230]: n_dims = 1, name = v.blk.14.attn_k.bias, tensor_size=4096, padded_size=4096, offset=357562368
clip_model_load: tensor[231]: n_dims = 2, name = v.blk.14.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=357566464
clip_model_load: tensor[232]: n_dims = 1, name = v.blk.14.attn_v.bias, tensor_size=4096, padded_size=4096, offset=359663616
clip_model_load: tensor[233]: n_dims = 2, name = v.blk.14.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=359667712
clip_model_load: tensor[234]: n_dims = 1, name = v.blk.14.attn_q.bias, tensor_size=4096, padded_size=4096, offset=361764864
clip_model_load: tensor[235]: n_dims = 2, name = v.blk.14.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=361768960
clip_model_load: tensor[236]: n_dims = 1, name = v.blk.14.attn_out.bias, tensor_size=4096, padded_size=4096, offset=363866112
clip_model_load: tensor[237]: n_dims = 1, name = v.blk.14.ln1.weight, tensor_size=4096, padded_size=4096, offset=363870208
clip_model_load: tensor[238]: n_dims = 1, name = v.blk.14.ln1.bias, tensor_size=4096, padded_size=4096, offset=363874304
clip_model_load: tensor[239]: n_dims = 2, name = v.blk.14.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=363878400
clip_model_load: tensor[240]: n_dims = 1, name = v.blk.14.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=372267008
clip_model_load: tensor[241]: n_dims = 2, name = v.blk.14.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=372283392
clip_model_load: tensor[242]: n_dims = 1, name = v.blk.14.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=380672000
clip_model_load: tensor[243]: n_dims = 1, name = v.blk.14.ln2.weight, tensor_size=4096, padded_size=4096, offset=380676096
clip_model_load: tensor[244]: n_dims = 1, name = v.blk.14.ln2.bias, tensor_size=4096, padded_size=4096, offset=380680192
clip_model_load: tensor[245]: n_dims = 2, name = v.blk.15.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=380684288
clip_model_load: tensor[246]: n_dims = 1, name = v.blk.15.attn_k.bias, tensor_size=4096, padded_size=4096, offset=382781440
clip_model_load: tensor[247]: n_dims = 2, name = v.blk.15.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=382785536
clip_model_load: tensor[248]: n_dims = 1, name = v.blk.15.attn_v.bias, tensor_size=4096, padded_size=4096, offset=384882688
clip_model_load: tensor[249]: n_dims = 2, name = v.blk.15.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=384886784
clip_model_load: tensor[250]: n_dims = 1, name = v.blk.15.attn_q.bias, tensor_size=4096, padded_size=4096, offset=386983936
clip_model_load: tensor[251]: n_dims = 2, name = v.blk.15.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=386988032
clip_model_load: tensor[252]: n_dims = 1, name = v.blk.15.attn_out.bias, tensor_size=4096, padded_size=4096, offset=389085184
clip_model_load: tensor[253]: n_dims = 1, name = v.blk.15.ln1.weight, tensor_size=4096, padded_size=4096, offset=389089280
clip_model_load: tensor[254]: n_dims = 1, name = v.blk.15.ln1.bias, tensor_size=4096, padded_size=4096, offset=389093376
clip_model_load: tensor[255]: n_dims = 2, name = v.blk.15.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=389097472
clip_model_load: tensor[256]: n_dims = 1, name = v.blk.15.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=397486080
clip_model_load: tensor[257]: n_dims = 2, name = v.blk.15.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=397502464
clip_model_load: tensor[258]: n_dims = 1, name = v.blk.15.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=405891072
clip_model_load: tensor[259]: n_dims = 1, name = v.blk.15.ln2.weight, tensor_size=4096, padded_size=4096, offset=405895168
clip_model_load: tensor[260]: n_dims = 1, name = v.blk.15.ln2.bias, tensor_size=4096, padded_size=4096, offset=405899264
clip_model_load: tensor[261]: n_dims = 2, name = v.blk.16.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=405903360
clip_model_load: tensor[262]: n_dims = 1, name = v.blk.16.attn_k.bias, tensor_size=4096, padded_size=4096, offset=408000512
clip_model_load: tensor[263]: n_dims = 2, name = v.blk.16.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=408004608
clip_model_load: tensor[264]: n_dims = 1, name = v.blk.16.attn_v.bias, tensor_size=4096, padded_size=4096, offset=410101760
clip_model_load: tensor[265]: n_dims = 2, name = v.blk.16.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=410105856
clip_model_load: tensor[266]: n_dims = 1, name = v.blk.16.attn_q.bias, tensor_size=4096, padded_size=4096, offset=412203008
clip_model_load: tensor[267]: n_dims = 2, name = v.blk.16.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=412207104
clip_model_load: tensor[268]: n_dims = 1, name = v.blk.16.attn_out.bias, tensor_size=4096, padded_size=4096, offset=414304256
clip_model_load: tensor[269]: n_dims = 1, name = v.blk.16.ln1.weight, tensor_size=4096, padded_size=4096, offset=414308352
clip_model_load: tensor[270]: n_dims = 1, name = v.blk.16.ln1.bias, tensor_size=4096, padded_size=4096, offset=414312448
clip_model_load: tensor[271]: n_dims = 2, name = v.blk.16.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=414316544
clip_model_load: tensor[272]: n_dims = 1, name = v.blk.16.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=422705152
clip_model_load: tensor[273]: n_dims = 2, name = v.blk.16.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=422721536
clip_model_load: tensor[274]: n_dims = 1, name = v.blk.16.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=431110144
clip_model_load: tensor[275]: n_dims = 1, name = v.blk.16.ln2.weight, tensor_size=4096, padded_size=4096, offset=431114240
clip_model_load: tensor[276]: n_dims = 1, name = v.blk.16.ln2.bias, tensor_size=4096, padded_size=4096, offset=431118336
clip_model_load: tensor[277]: n_dims = 2, name = v.blk.17.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=431122432
clip_model_load: tensor[278]: n_dims = 1, name = v.blk.17.attn_k.bias, tensor_size=4096, padded_size=4096, offset=433219584
clip_model_load: tensor[279]: n_dims = 2, name = v.blk.17.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=433223680
clip_model_load: tensor[280]: n_dims = 1, name = v.blk.17.attn_v.bias, tensor_size=4096, padded_size=4096, offset=435320832
clip_model_load: tensor[281]: n_dims = 2, name = v.blk.17.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=435324928
clip_model_load: tensor[282]: n_dims = 1, name = v.blk.17.attn_q.bias, tensor_size=4096, padded_size=4096, offset=437422080
clip_model_load: tensor[283]: n_dims = 2, name = v.blk.17.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=437426176
clip_model_load: tensor[284]: n_dims = 1, name = v.blk.17.attn_out.bias, tensor_size=4096, padded_size=4096, offset=439523328
clip_model_load: tensor[285]: n_dims = 1, name = v.blk.17.ln1.weight, tensor_size=4096, padded_size=4096, offset=439527424
clip_model_load: tensor[286]: n_dims = 1, name = v.blk.17.ln1.bias, tensor_size=4096, padded_size=4096, offset=439531520
clip_model_load: tensor[287]: n_dims = 2, name = v.blk.17.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=439535616
clip_model_load: tensor[288]: n_dims = 1, name = v.blk.17.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=447924224
clip_model_load: tensor[289]: n_dims = 2, name = v.blk.17.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=447940608
clip_model_load: tensor[290]: n_dims = 1, name = v.blk.17.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=456329216
clip_model_load: tensor[291]: n_dims = 1, name = v.blk.17.ln2.weight, tensor_size=4096, padded_size=4096, offset=456333312
clip_model_load: tensor[292]: n_dims = 1, name = v.blk.17.ln2.bias, tensor_size=4096, padded_size=4096, offset=456337408
clip_model_load: tensor[293]: n_dims = 2, name = v.blk.18.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=456341504
clip_model_load: tensor[294]: n_dims = 1, name = v.blk.18.attn_k.bias, tensor_size=4096, padded_size=4096, offset=458438656
clip_model_load: tensor[295]: n_dims = 2, name = v.blk.18.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=458442752
clip_model_load: tensor[296]: n_dims = 1, name = v.blk.18.attn_v.bias, tensor_size=4096, padded_size=4096, offset=460539904
clip_model_load: tensor[297]: n_dims = 2, name = v.blk.18.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=460544000
clip_model_load: tensor[298]: n_dims = 1, name = v.blk.18.attn_q.bias, tensor_size=4096, padded_size=4096, offset=462641152
clip_model_load: tensor[299]: n_dims = 2, name = v.blk.18.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=462645248
clip_model_load: tensor[300]: n_dims = 1, name = v.blk.18.attn_out.bias, tensor_size=4096, padded_size=4096, offset=464742400
clip_model_load: tensor[301]: n_dims = 1, name = v.blk.18.ln1.weight, tensor_size=4096, padded_size=4096, offset=464746496
clip_model_load: tensor[302]: n_dims = 1, name = v.blk.18.ln1.bias, tensor_size=4096, padded_size=4096, offset=464750592
clip_model_load: tensor[303]: n_dims = 2, name = v.blk.18.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=464754688
clip_model_load: tensor[304]: n_dims = 1, name = v.blk.18.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=473143296
clip_model_load: tensor[305]: n_dims = 2, name = v.blk.18.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=473159680
clip_model_load: tensor[306]: n_dims = 1, name = v.blk.18.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=481548288
clip_model_load: tensor[307]: n_dims = 1, name = v.blk.18.ln2.weight, tensor_size=4096, padded_size=4096, offset=481552384
clip_model_load: tensor[308]: n_dims = 1, name = v.blk.18.ln2.bias, tensor_size=4096, padded_size=4096, offset=481556480
clip_model_load: tensor[309]: n_dims = 2, name = v.blk.19.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=481560576
clip_model_load: tensor[310]: n_dims = 1, name = v.blk.19.attn_k.bias, tensor_size=4096, padded_size=4096, offset=483657728
clip_model_load: tensor[311]: n_dims = 2, name = v.blk.19.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=483661824
clip_model_load: tensor[312]: n_dims = 1, name = v.blk.19.attn_v.bias, tensor_size=4096, padded_size=4096, offset=485758976
clip_model_load: tensor[313]: n_dims = 2, name = v.blk.19.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=485763072
clip_model_load: tensor[314]: n_dims = 1, name = v.blk.19.attn_q.bias, tensor_size=4096, padded_size=4096, offset=487860224
clip_model_load: tensor[315]: n_dims = 2, name = v.blk.19.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=487864320
clip_model_load: tensor[316]: n_dims = 1, name = v.blk.19.attn_out.bias, tensor_size=4096, padded_size=4096, offset=489961472
clip_model_load: tensor[317]: n_dims = 1, name = v.blk.19.ln1.weight, tensor_size=4096, padded_size=4096, offset=489965568
clip_model_load: tensor[318]: n_dims = 1, name = v.blk.19.ln1.bias, tensor_size=4096, padded_size=4096, offset=489969664
clip_model_load: tensor[319]: n_dims = 2, name = v.blk.19.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=489973760
clip_model_load: tensor[320]: n_dims = 1, name = v.blk.19.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=498362368
clip_model_load: tensor[321]: n_dims = 2, name = v.blk.19.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=498378752
clip_model_load: tensor[322]: n_dims = 1, name = v.blk.19.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=506767360
clip_model_load: tensor[323]: n_dims = 1, name = v.blk.19.ln2.weight, tensor_size=4096, padded_size=4096, offset=506771456
clip_model_load: tensor[324]: n_dims = 1, name = v.blk.19.ln2.bias, tensor_size=4096, padded_size=4096, offset=506775552
clip_model_load: tensor[325]: n_dims = 2, name = v.blk.20.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=506779648
clip_model_load: tensor[326]: n_dims = 1, name = v.blk.20.attn_k.bias, tensor_size=4096, padded_size=4096, offset=508876800
clip_model_load: tensor[327]: n_dims = 2, name = v.blk.20.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=508880896
clip_model_load: tensor[328]: n_dims = 1, name = v.blk.20.attn_v.bias, tensor_size=4096, padded_size=4096, offset=510978048
clip_model_load: tensor[329]: n_dims = 2, name = v.blk.20.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=510982144
clip_model_load: tensor[330]: n_dims = 1, name = v.blk.20.attn_q.bias, tensor_size=4096, padded_size=4096, offset=513079296
clip_model_load: tensor[331]: n_dims = 2, name = v.blk.20.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=513083392
clip_model_load: tensor[332]: n_dims = 1, name = v.blk.20.attn_out.bias, tensor_size=4096, padded_size=4096, offset=515180544
clip_model_load: tensor[333]: n_dims = 1, name = v.blk.20.ln1.weight, tensor_size=4096, padded_size=4096, offset=515184640
clip_model_load: tensor[334]: n_dims = 1, name = v.blk.20.ln1.bias, tensor_size=4096, padded_size=4096, offset=515188736
clip_model_load: tensor[335]: n_dims = 2, name = v.blk.20.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=515192832
clip_model_load: tensor[336]: n_dims = 1, name = v.blk.20.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=523581440
clip_model_load: tensor[337]: n_dims = 2, name = v.blk.20.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=523597824
clip_model_load: tensor[338]: n_dims = 1, name = v.blk.20.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=531986432
clip_model_load: tensor[339]: n_dims = 1, name = v.blk.20.ln2.weight, tensor_size=4096, padded_size=4096, offset=531990528
clip_model_load: tensor[340]: n_dims = 1, name = v.blk.20.ln2.bias, tensor_size=4096, padded_size=4096, offset=531994624
clip_model_load: tensor[341]: n_dims = 2, name = v.blk.21.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=531998720
clip_model_load: tensor[342]: n_dims = 1, name = v.blk.21.attn_k.bias, tensor_size=4096, padded_size=4096, offset=534095872
clip_model_load: tensor[343]: n_dims = 2, name = v.blk.21.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=534099968
clip_model_load: tensor[344]: n_dims = 1, name = v.blk.21.attn_v.bias, tensor_size=4096, padded_size=4096, offset=536197120
clip_model_load: tensor[345]: n_dims = 2, name = v.blk.21.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=536201216
clip_model_load: tensor[346]: n_dims = 1, name = v.blk.21.attn_q.bias, tensor_size=4096, padded_size=4096, offset=538298368
clip_model_load: tensor[347]: n_dims = 2, name = v.blk.21.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=538302464
clip_model_load: tensor[348]: n_dims = 1, name = v.blk.21.attn_out.bias, tensor_size=4096, padded_size=4096, offset=540399616
clip_model_load: tensor[349]: n_dims = 1, name = v.blk.21.ln1.weight, tensor_size=4096, padded_size=4096, offset=540403712
clip_model_load: tensor[350]: n_dims = 1, name = v.blk.21.ln1.bias, tensor_size=4096, padded_size=4096, offset=540407808
clip_model_load: tensor[351]: n_dims = 2, name = v.blk.21.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=540411904
clip_model_load: tensor[352]: n_dims = 1, name = v.blk.21.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=548800512
clip_model_load: tensor[353]: n_dims = 2, name = v.blk.21.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=548816896
clip_model_load: tensor[354]: n_dims = 1, name = v.blk.21.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=557205504
clip_model_load: tensor[355]: n_dims = 1, name = v.blk.21.ln2.weight, tensor_size=4096, padded_size=4096, offset=557209600
clip_model_load: tensor[356]: n_dims = 1, name = v.blk.21.ln2.bias, tensor_size=4096, padded_size=4096, offset=557213696
clip_model_load: tensor[357]: n_dims = 2, name = v.blk.22.attn_k.weight, tensor_size=2097152, padded_size=2097152, offset=557217792
clip_model_load: tensor[358]: n_dims = 1, name = v.blk.22.attn_k.bias, tensor_size=4096, padded_size=4096, offset=559314944
clip_model_load: tensor[359]: n_dims = 2, name = v.blk.22.attn_v.weight, tensor_size=2097152, padded_size=2097152, offset=559319040
clip_model_load: tensor[360]: n_dims = 1, name = v.blk.22.attn_v.bias, tensor_size=4096, padded_size=4096, offset=561416192
clip_model_load: tensor[361]: n_dims = 2, name = v.blk.22.attn_q.weight, tensor_size=2097152, padded_size=2097152, offset=561420288
clip_model_load: tensor[362]: n_dims = 1, name = v.blk.22.attn_q.bias, tensor_size=4096, padded_size=4096, offset=563517440
clip_model_load: tensor[363]: n_dims = 2, name = v.blk.22.attn_out.weight, tensor_size=2097152, padded_size=2097152, offset=563521536
clip_model_load: tensor[364]: n_dims = 1, name = v.blk.22.attn_out.bias, tensor_size=4096, padded_size=4096, offset=565618688
clip_model_load: tensor[365]: n_dims = 1, name = v.blk.22.ln1.weight, tensor_size=4096, padded_size=4096, offset=565622784
clip_model_load: tensor[366]: n_dims = 1, name = v.blk.22.ln1.bias, tensor_size=4096, padded_size=4096, offset=565626880
clip_model_load: tensor[367]: n_dims = 2, name = v.blk.22.ffn_down.weight, tensor_size=8388608, padded_size=8388608, offset=565630976
clip_model_load: tensor[368]: n_dims = 1, name = v.blk.22.ffn_down.bias, tensor_size=16384, padded_size=16384, offset=574019584
clip_model_load: tensor[369]: n_dims = 2, name = v.blk.22.ffn_up.weight, tensor_size=8388608, padded_size=8388608, offset=574035968
clip_model_load: tensor[370]: n_dims = 1, name = v.blk.22.ffn_up.bias, tensor_size=4096, padded_size=4096, offset=582424576
clip_model_load: tensor[371]: n_dims = 1, name = v.blk.22.ln2.weight, tensor_size=4096, padded_size=4096, offset=582428672
clip_model_load: tensor[372]: n_dims = 1, name = v.blk.22.ln2.bias, tensor_size=4096, padded_size=4096, offset=582432768
clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
clip_model_load: llava_projector:  1
clip_model_load: model size:     555.59 MB
clip_model_load: metadata size:  0.14 MB

clip_model_load: vision model hparams
image_size         336
patch_size         14
v_hidden_size      1024
v_n_intermediate   4096
v_projection_dim   768
v_n_head           16
v_n_layer          23
terminate called after throwing an instance of 'std::runtime_error'
  what():  get_tensor: unable to find tensor mm.0.weight

已放弃
(base) [root@k8s-0 llama.cpp]# 

@itsPreto
Copy link
Author

@winer632

Only thing I can suggest is make sure you're following the instructions exactly as they are written-- seems to be a lot of untested edge cases with Llava implementation still.

Also, make sure you're using the modified convert-image-encoder-to-gguf script:

python ./examples/llava/convert-image-encoder-to-gguf.py -m models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector models/shareGPT4V-7B/llava.projector --output-dir models/shareGPT4V-7B --clip_model_is_vision

@Galunid
Copy link
Collaborator

Galunid commented Nov 28, 2023

@winer632 You failed to correctly convert the model.
I uploaded my working version to huggingface: https://huggingface.co/Galunid/ShareGPT4V-gguf
You should be able do run or quantize.

@winer632
Copy link

@winer632 You failed to correctly convert the model. I uploaded my working version to huggingface: https://huggingface.co/Galunid/ShareGPT4V-gguf You should be able do run or quantize.

It works! Thanks a lot!

@winer632
Copy link

winer632 commented Nov 29, 2023

@winer632

Only thing I can suggest is make sure you're following the instructions exactly as they are written-- seems to be a lot of untested edge cases with Llava implementation still.

Also, make sure you're using the modified convert-image-encoder-to-gguf script:

python ./examples/llava/convert-image-encoder-to-gguf.py -m models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector models/shareGPT4V-7B/llava.projector --output-dir models/shareGPT4V-7B --clip_model_is_vision

Yes I followed the instructions exactly as they were written. Following are my steps.

Download the data here to the /home/llm/llama.cpp/models/ShareGPT4V-7B/ directory
git clone https://huggingface.co/Lin-Chen/ShareGPT4V-7B/

Download the data here to the /home/llm/llama.cpp/models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ directory
git clone https://huggingface.co/Lin-Chen/ShareGPT4V-7B_Pretrained_vit-large336-l12/

E60EE238-9252-40CD-84FE-955D7A251257

First modify /home/llm/llama.cpp/examples/llava/convert-image-encoder-to-gguf.py according to this PR https://github.com/ggerganov/llama.cpp/pull/4172/files
0B8D9BA4-E389-456C-ACC3-0DA5D7A0135D

Then install the following dependencies
pip install transformers torch gguf accelerate fairscale fire sentencepiece

8164BE14-49CB-426A-A667-A19B962C9CBF

Execute the python ./examples/llava/llava-surgery.py -m models/ShareGPT4V-7B command in the /home/llm/llama.cpp directory
270621D9-CC50-4DC9-BA0F-5588B665583E

Found that projector was generated
188D340E-CBE4-49FE-A711-1576495C60B4

Execute python ./examples/llava/convert-image-encoder-to-gguf.py -m models/ShareGPT4V-7B_Pretrained_vit-large336-l12/ --llava-projector models/ShareGPT4V in the /home/llm/llama.cpp directory -7B/llava.projector --output-dir models/ShareGPT4V-7B --clip_model_is_vision

5B36DAA2-09B3-4145-BABA-FB0E6BC77E1D

Found that mmproj-model-f16.gguf was generated

F7175C2A-F703-4122-826D-D8C61B78FC5E

Execute python convert.py models/ShareGPT4V-7B in the /home/llm/llama.cpp directory

7C6748AA-F704-4125-89FB-FD9364602D7A

Found that ggml-model-f16.gguf was generated
F7175C2A-F703-4122-826D-D8C61B78FC5E

Execute make llava-cli in the /home/llm/llama.cpp directory

4CF01A7A-EE85-4E52-80B6-59092876787F

Execute ./llava-cli -m ./models/ShareGPT4V-7B/ggml-model-f16.gguf --mmproj ./models/ShareGPT4V-7B/mmproj-model-f16 in the /home/llm/llama.cpp directory .gguf --image ../test_photo/zhuanma.jpeg

Got this error

286110116-9a8c2dcc-94f3-4f15-938a-1727ffa2c37c

@winer632
Copy link

winer632 commented Nov 29, 2023

@winer632 You failed to correctly convert the model. I uploaded my working version to huggingface: https://huggingface.co/Galunid/ShareGPT4V-gguf You should be able do run or quantize.

@cmp-nct @itsPreto @Galunid
Can I add some prompts to this command to increase the flexibility of image processing? How?

./llava-cli -m ./models/ShareGPT4V-gguf/ShareGPT4V-f16.gguf --mmproj ./models/ShareGPT4V-gguf/mmproj-model-f16.gguf --image ../test_photo/zhuanma.jpeg

@Galunid
Copy link
Collaborator

Galunid commented Nov 29, 2023

You can add -p "Describe image in great detail" or whatever you want it to be.

@winer632
Copy link

You can add -p "Describe image in great detail" or whatever you want it to be.

Thank you!

@Galunid
Copy link
Collaborator

Galunid commented Dec 8, 2023

closed in #4172

@Galunid Galunid closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model Model specific
Projects
None yet
Development

No branches or pull requests

6 participants