-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run export script on CPU #78
Conversation
i appreciate the PR but 🤦♂️ ... we have to copy paste and hack the entire model/generation script? :| |
This can't be necessary. Surely we can just manually load the checkpoint's state_dict on CPU and export the weights that way and directly? |
I guess you're right, I'll look into that |
From ChatGPT: "It's too hard" :-)
I don't think it is necessary to parse all the Pickle stuff. I bet we can do something that works just good enough for the use case and get the info out. |
A short term possibility to unblock people could be to use the convert.py from the llama.cpp project to get the data into ggml format and then load those f42@formica:~/dev/llama$ ../llama.cpp/convert.py llama-2-7b
Loading model file llama-2-7b/consolidated.00.pth
vocabtype: spm
Loading vocab file tokenizer.model
params: n_vocab:32000 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Writing vocab...
[ 1/291] Writing tensor tok_embeddings.weight | size 32000 x 4096 | type UnquantizedDataType(name='F32')
[ 2/291] Writing tensor norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 3/291] Writing tensor output.weight | size 32000 x 4096 | type UnquantizedDataType(name='F32')
[ 4/291] Writing tensor layers.0.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 5/291] Writing tensor layers.0.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 6/291] Writing tensor layers.0.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 7/291] Writing tensor layers.0.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 8/291] Writing tensor layers.0.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 9/291] Writing tensor layers.0.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 10/291] Writing tensor layers.0.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 11/291] Writing tensor layers.0.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 12/291] Writing tensor layers.0.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 13/291] Writing tensor layers.1.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 14/291] Writing tensor layers.1.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 15/291] Writing tensor layers.1.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 16/291] Writing tensor layers.1.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 17/291] Writing tensor layers.1.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 18/291] Writing tensor layers.1.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 19/291] Writing tensor layers.1.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 20/291] Writing tensor layers.1.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 21/291] Writing tensor layers.1.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 22/291] Writing tensor layers.2.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 23/291] Writing tensor layers.2.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 24/291] Writing tensor layers.2.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 25/291] Writing tensor layers.2.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 26/291] Writing tensor layers.2.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 27/291] Writing tensor layers.2.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 28/291] Writing tensor layers.2.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 29/291] Writing tensor layers.2.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 30/291] Writing tensor layers.2.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 31/291] Writing tensor layers.3.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 32/291] Writing tensor layers.3.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 33/291] Writing tensor layers.3.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 34/291] Writing tensor layers.3.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 35/291] Writing tensor layers.3.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 36/291] Writing tensor layers.3.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 37/291] Writing tensor layers.3.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 38/291] Writing tensor layers.3.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 39/291] Writing tensor layers.3.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 40/291] Writing tensor layers.4.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 41/291] Writing tensor layers.4.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 42/291] Writing tensor layers.4.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 43/291] Writing tensor layers.4.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 44/291] Writing tensor layers.4.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 45/291] Writing tensor layers.4.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 46/291] Writing tensor layers.4.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 47/291] Writing tensor layers.4.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 48/291] Writing tensor layers.4.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 49/291] Writing tensor layers.5.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 50/291] Writing tensor layers.5.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 51/291] Writing tensor layers.5.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 52/291] Writing tensor layers.5.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 53/291] Writing tensor layers.5.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 54/291] Writing tensor layers.5.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 55/291] Writing tensor layers.5.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 56/291] Writing tensor layers.5.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 57/291] Writing tensor layers.5.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 58/291] Writing tensor layers.6.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 59/291] Writing tensor layers.6.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 60/291] Writing tensor layers.6.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 61/291] Writing tensor layers.6.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 62/291] Writing tensor layers.6.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 63/291] Writing tensor layers.6.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 64/291] Writing tensor layers.6.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 65/291] Writing tensor layers.6.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 66/291] Writing tensor layers.6.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 67/291] Writing tensor layers.7.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 68/291] Writing tensor layers.7.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 69/291] Writing tensor layers.7.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 70/291] Writing tensor layers.7.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 71/291] Writing tensor layers.7.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 72/291] Writing tensor layers.7.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 73/291] Writing tensor layers.7.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 74/291] Writing tensor layers.7.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 75/291] Writing tensor layers.7.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 76/291] Writing tensor layers.8.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 77/291] Writing tensor layers.8.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 78/291] Writing tensor layers.8.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 79/291] Writing tensor layers.8.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 80/291] Writing tensor layers.8.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 81/291] Writing tensor layers.8.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 82/291] Writing tensor layers.8.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 83/291] Writing tensor layers.8.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 84/291] Writing tensor layers.8.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 85/291] Writing tensor layers.9.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 86/291] Writing tensor layers.9.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 87/291] Writing tensor layers.9.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 88/291] Writing tensor layers.9.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 89/291] Writing tensor layers.9.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 90/291] Writing tensor layers.9.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 91/291] Writing tensor layers.9.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[ 92/291] Writing tensor layers.9.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[ 93/291] Writing tensor layers.9.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 94/291] Writing tensor layers.10.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 95/291] Writing tensor layers.10.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 96/291] Writing tensor layers.10.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 97/291] Writing tensor layers.10.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[ 98/291] Writing tensor layers.10.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[ 99/291] Writing tensor layers.10.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[100/291] Writing tensor layers.10.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[101/291] Writing tensor layers.10.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[102/291] Writing tensor layers.10.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[103/291] Writing tensor layers.11.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[104/291] Writing tensor layers.11.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[105/291] Writing tensor layers.11.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[106/291] Writing tensor layers.11.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[107/291] Writing tensor layers.11.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[108/291] Writing tensor layers.11.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[109/291] Writing tensor layers.11.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[110/291] Writing tensor layers.11.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[111/291] Writing tensor layers.11.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[112/291] Writing tensor layers.12.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[113/291] Writing tensor layers.12.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[114/291] Writing tensor layers.12.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[115/291] Writing tensor layers.12.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[116/291] Writing tensor layers.12.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[117/291] Writing tensor layers.12.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[118/291] Writing tensor layers.12.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[119/291] Writing tensor layers.12.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[120/291] Writing tensor layers.12.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[121/291] Writing tensor layers.13.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[122/291] Writing tensor layers.13.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[123/291] Writing tensor layers.13.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[124/291] Writing tensor layers.13.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[125/291] Writing tensor layers.13.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[126/291] Writing tensor layers.13.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[127/291] Writing tensor layers.13.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[128/291] Writing tensor layers.13.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[129/291] Writing tensor layers.13.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[130/291] Writing tensor layers.14.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[131/291] Writing tensor layers.14.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[132/291] Writing tensor layers.14.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[133/291] Writing tensor layers.14.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[134/291] Writing tensor layers.14.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[135/291] Writing tensor layers.14.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[136/291] Writing tensor layers.14.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[137/291] Writing tensor layers.14.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[138/291] Writing tensor layers.14.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[139/291] Writing tensor layers.15.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[140/291] Writing tensor layers.15.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[141/291] Writing tensor layers.15.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[142/291] Writing tensor layers.15.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[143/291] Writing tensor layers.15.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[144/291] Writing tensor layers.15.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[145/291] Writing tensor layers.15.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[146/291] Writing tensor layers.15.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[147/291] Writing tensor layers.15.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[148/291] Writing tensor layers.16.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[149/291] Writing tensor layers.16.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[150/291] Writing tensor layers.16.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[151/291] Writing tensor layers.16.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[152/291] Writing tensor layers.16.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[153/291] Writing tensor layers.16.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[154/291] Writing tensor layers.16.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[155/291] Writing tensor layers.16.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[156/291] Writing tensor layers.16.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[157/291] Writing tensor layers.17.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[158/291] Writing tensor layers.17.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[159/291] Writing tensor layers.17.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[160/291] Writing tensor layers.17.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[161/291] Writing tensor layers.17.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[162/291] Writing tensor layers.17.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[163/291] Writing tensor layers.17.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[164/291] Writing tensor layers.17.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[165/291] Writing tensor layers.17.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[166/291] Writing tensor layers.18.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[167/291] Writing tensor layers.18.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[168/291] Writing tensor layers.18.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[169/291] Writing tensor layers.18.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[170/291] Writing tensor layers.18.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[171/291] Writing tensor layers.18.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[172/291] Writing tensor layers.18.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[173/291] Writing tensor layers.18.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[174/291] Writing tensor layers.18.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[175/291] Writing tensor layers.19.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[176/291] Writing tensor layers.19.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[177/291] Writing tensor layers.19.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[178/291] Writing tensor layers.19.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[179/291] Writing tensor layers.19.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[180/291] Writing tensor layers.19.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[181/291] Writing tensor layers.19.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[182/291] Writing tensor layers.19.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[183/291] Writing tensor layers.19.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[184/291] Writing tensor layers.20.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[185/291] Writing tensor layers.20.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[186/291] Writing tensor layers.20.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[187/291] Writing tensor layers.20.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[188/291] Writing tensor layers.20.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[189/291] Writing tensor layers.20.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[190/291] Writing tensor layers.20.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[191/291] Writing tensor layers.20.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[192/291] Writing tensor layers.20.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[193/291] Writing tensor layers.21.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[194/291] Writing tensor layers.21.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[195/291] Writing tensor layers.21.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[196/291] Writing tensor layers.21.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[197/291] Writing tensor layers.21.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[198/291] Writing tensor layers.21.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[199/291] Writing tensor layers.21.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[200/291] Writing tensor layers.21.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[201/291] Writing tensor layers.21.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[202/291] Writing tensor layers.22.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[203/291] Writing tensor layers.22.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[204/291] Writing tensor layers.22.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[205/291] Writing tensor layers.22.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[206/291] Writing tensor layers.22.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[207/291] Writing tensor layers.22.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[208/291] Writing tensor layers.22.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[209/291] Writing tensor layers.22.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[210/291] Writing tensor layers.22.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[211/291] Writing tensor layers.23.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[212/291] Writing tensor layers.23.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[213/291] Writing tensor layers.23.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[214/291] Writing tensor layers.23.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[215/291] Writing tensor layers.23.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[216/291] Writing tensor layers.23.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[217/291] Writing tensor layers.23.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[218/291] Writing tensor layers.23.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[219/291] Writing tensor layers.23.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[220/291] Writing tensor layers.24.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[221/291] Writing tensor layers.24.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[222/291] Writing tensor layers.24.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[223/291] Writing tensor layers.24.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[224/291] Writing tensor layers.24.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[225/291] Writing tensor layers.24.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[226/291] Writing tensor layers.24.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[227/291] Writing tensor layers.24.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[228/291] Writing tensor layers.24.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[229/291] Writing tensor layers.25.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[230/291] Writing tensor layers.25.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[231/291] Writing tensor layers.25.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[232/291] Writing tensor layers.25.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[233/291] Writing tensor layers.25.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[234/291] Writing tensor layers.25.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[235/291] Writing tensor layers.25.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[236/291] Writing tensor layers.25.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[237/291] Writing tensor layers.25.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[238/291] Writing tensor layers.26.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[239/291] Writing tensor layers.26.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[240/291] Writing tensor layers.26.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[241/291] Writing tensor layers.26.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[242/291] Writing tensor layers.26.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[243/291] Writing tensor layers.26.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[244/291] Writing tensor layers.26.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[245/291] Writing tensor layers.26.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[246/291] Writing tensor layers.26.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[247/291] Writing tensor layers.27.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[248/291] Writing tensor layers.27.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[249/291] Writing tensor layers.27.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[250/291] Writing tensor layers.27.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[251/291] Writing tensor layers.27.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[252/291] Writing tensor layers.27.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[253/291] Writing tensor layers.27.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[254/291] Writing tensor layers.27.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[255/291] Writing tensor layers.27.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[256/291] Writing tensor layers.28.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[257/291] Writing tensor layers.28.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[258/291] Writing tensor layers.28.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[259/291] Writing tensor layers.28.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[260/291] Writing tensor layers.28.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[261/291] Writing tensor layers.28.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[262/291] Writing tensor layers.28.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[263/291] Writing tensor layers.28.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[264/291] Writing tensor layers.28.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[265/291] Writing tensor layers.29.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[266/291] Writing tensor layers.29.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[267/291] Writing tensor layers.29.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[268/291] Writing tensor layers.29.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[269/291] Writing tensor layers.29.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[270/291] Writing tensor layers.29.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[271/291] Writing tensor layers.29.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[272/291] Writing tensor layers.29.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[273/291] Writing tensor layers.29.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[274/291] Writing tensor layers.30.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[275/291] Writing tensor layers.30.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[276/291] Writing tensor layers.30.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[277/291] Writing tensor layers.30.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[278/291] Writing tensor layers.30.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[279/291] Writing tensor layers.30.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[280/291] Writing tensor layers.30.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[281/291] Writing tensor layers.30.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[282/291] Writing tensor layers.30.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[283/291] Writing tensor layers.31.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[284/291] Writing tensor layers.31.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[285/291] Writing tensor layers.31.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[286/291] Writing tensor layers.31.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F32')
[287/291] Writing tensor layers.31.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
[288/291] Writing tensor layers.31.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[289/291] Writing tensor layers.31.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F32')
[290/291] Writing tensor layers.31.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F32')
[291/291] Writing tensor layers.31.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')
Wrote llama-2-7b/ggml-model-f32.bin |
Almost got working solution |
Updated export script as @karpathy suggested - no more dependencies on generation and model scripts. For some reason md5 differs from previous conversion, however model seems to run correctly. |
w00t - certainly does! It's very slow converting compared to the convert.py in llama2.cpp project. I don't know why that is, but who cares, it works :-D great stuff!! f42@formica:~/dev/llama2.c$ ./runomp ./llama2_7b.bin
<s>
Tags: javascript, angularjs, drop-down-menu, angular-ui-bootstrap
Question: How to set default option in angularjs bootstrap modal
I am working on a project. In which when I am using bootstrap modal so it's showing a drop down list with their options after selecting one option I am getting the value from it. I am using this code
\begin{code}
<!-- ng-options = 'item.name for item in allOptions', ng-model = 'item.id' -->
<form name="login_form">
<label class=" col-sm-10 control-label" for="matrixCode">Items</label>
<select type="text" class="form-control" id="matrixCode" ng-options="item.name as item.name for item in allOptions" ng-model="item.id" required="required" >
<option ng-repeat="item in allOptions">{{item.name}}</option>
<option value="">Please Select Item</option>
</select>
<label class=" col-sm-10 control-label" for="itemName
achieved tok/s: 36.790199 |
Few benchmarks Seems to use about 30GB of RAM. Definitely start paying a hit when switching over from raw cores to hyper-threads. 12 out of the 16 seems to be the sweetspot. kewl stuff. make runfast f42@formica:~/dev/llama2.c$ ./run llama2_7b.bin
<s>
Live streaming video will begin when Steger Party officially opens at 6:30 PM. Chris can be found on the patio off the back of the parlor. Click on the image to go to the Facebook live link.
11:48PM... It's after midnight and the party at Steger Stables continues... these images were taken on the way out.
[Video Removed] - For Sturgis 2014 click here.
7:23PM... Steger Party is officially underway. Featured auctions include the custom built H-D Street Glide and a special gift from Wille Bruynzeel honoring Chris McKee.
5:32PM... Chris can be found taking pictures of all he loves outside Chris Steger Stables in Welch, WV. The picture to the right was taken in 2009 and was the last time the Steger family threw a party at their stables.
3:08PM... The 2015 Mother's Day Brunch is coming to an end at Welch Civic Center. Guests can now browse the recently completed Ride to the Rock Cumberland stream
achieved tok/s: 0.376868 OMP: 4 Threads f42@formica:~/dev/llama2.c$ OMP_NUM_THREADS=4 ./runomp llama2_7b.bin
<s>
Director Fred Schepisi, whose “The Sixth Day” miserably falls apart from the start, seems uninterested in what the story could be. The early scenes skim over almost everything important, from the loss of the wife to the separation from the wife to the death of the son. The story begins well enough, with Martin and Jenny (Beau Bridges and Maria Bello) having dinner with their son Jordan (James McAvoy). They all seem happy and in love. A week or so later Jenny tells the son that she and Martin are getting a divorce. Like every young man, Jordan is angry and embarrassed about the whole thing. The mother and son exchange harsh words. Since this is a self-indulgent, emotion-driven film, the mother is solemn, brooding and pensive as she silently and sullenly retreats from her life. Jordan, an aspiring actor, gets a part in a movie and goes off to live in a trailer. He and his father remain estranged.
About a year later, Martin discovers Jenny’s body floating dead in a swimming pool, which has already been emptied for cleaning. Because
achieved tok/s: 23.625499 OMP: 8 Threads f42@formica:~/dev/llama2.c$ OMP_NUM_THREADS=8 ./runomp llama2_7b.bin
<s>
Rising sea levels trapping 15m people in coastal cities
Environment minister says towns around the world at risk if sea level rises by 20cm, though experts say coastal migration is already happening
Sea levels rise at 3.2mm a year, threatening coastal cities in many poor countries. Photograph: Michael Scott for the Guardian
Bethany Bell in Istanbul
Aaron Banks in Bangkok
Paul Harris in Switzerland
A third of the world's population is facing sea level rises that would leave them trapped in coastal cities without access to food or water, a government report has said.
The global commission on climate and the economy, a body that includes former US vice-president Al Gore, called for a massive reduction of greenhouse gas emissions to stop migration.
With 375 million already residing in coastal cities, researchers warned, the rising sea, its "uncertain future" and an accompanying rise in extreme weather events were now "central threats" to those areas.
But the study, Rising Seas, Consequences of Climate Change for Small Island Developing States, said even "
achieved tok/s: 30.883606 OMP: 12 Threads f42@formica:~/dev/llama2.c$ OMP_NUM_THREADS=12 ./runomp llama2_7b.bin
<s>
Lieutenant General Eike Afflerbach, Chief of Defence of the Bundeswehr, and Colonel General Jürgen Steinhoff, Supreme Commander of the Bundeswehr, participated in the 11th edition of the NATO Rapid Reaction Corps Land Headquarters Command Post Exercise (CPX11).
Between the 5th and 9th of November, ARRC Land Headquarters CPX11 took place at Pont-à-Mousson. Representatives from nine nations participated in the meeting: Belgium, Bulgaria, Norway, Germany, France, Italy, the Netherlands, the United Kingdom and the United Kingdom. In addition to the NATO CPX11, NATO Land Command selected representatives of ARRC Land A0 and A1 elements to take part in the exercise.
ARRC CPX11 is a training activity aimed at forming a C2 and exercise staff and developing the ability to generate an ARRC in a short time based on the ARRC Land A0 and A1 elements. In addition, the exercise helped to prepare the ARRC CPB (Battle Command Working Group) which was formed for the first time during ARC 16. Commander Jürgen Steinhoff, Supreme Commander
achieved tok/s: 38.697731 OMP: 16 Threads f42@formica:~/dev/llama2.c$ OMP_NUM_THREADS=16 ./runomp llama2_7b.bin
<s>
As album title suggests, this is the second in what is an unlikely series of themed albums. Melodic, is almost soothing to the ear, but there is a complexity and intricacy that belies that first impression.
The album's highlights are the title track, a symphonic and nostalgic piece that I can just imagine Coldplay performing acoustically at a royal wedding.
Orlando Weeks' vocals are most prominent on the opening track, Melting Ice Caps, which is a song that he had been writing for the album but without the Merseys take on vinyl, but his vocals are more distinguishable throughout some of the other pieces.
It's more of a statement than a song, the vocals are muted and almost submerged for the first two minutes of the composition, before we get an overture of the strings and piano. This is one of the many moments of fine production on the album and whatever takes listeners through the remaining minutes of the composition, which is orchestral, cinematic and truly stunning, before we get a worthy reprise.
Sunday comes across as kind of a country style tune. Imagine Coldplay with a
achieved tok/s: 33.607929 Machine (94GB total)
NUMANode L#0 (P#0 94GB)
Package L#0 + L3 L#0 (16MB) + L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
Package L#1 + L3 L#1 (16MB) + L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
Package L#2 + L3 L#2 (16MB) + L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
Package L#3 + L3 L#3 (16MB) + L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3)
Package L#4 + L3 L#4 (16MB) + L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4)
Package L#5 + L3 L#5 (16MB) + L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5)
Package L#6 + L3 L#6 (16MB) + L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6 (P#6)
Package L#7 + L3 L#7 (16MB) + L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7 (P#7)
Package L#8 + L3 L#8 (16MB) + L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8 (P#8)
Package L#9 + L3 L#9 (16MB) + L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9 (P#9)
Package L#10 + L3 L#10 (16MB) + L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU L#10 (P#10)
Package L#11 + L3 L#11 (16MB) + L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU L#11 (P#11)
Package L#12 + L3 L#12 (16MB) + L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU L#12 (P#12)
Package L#13 + L3 L#13 (16MB) + L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU L#13 (P#13)
Package L#14 + L3 L#14 (16MB) + L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU L#14 (P#14)
Package L#15 + L3 L#15 (16MB) + L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU L#15 (P#15)
HostBridge
PCI 00:01.1 (IDE)
Block "sr0"
PCI 00:02.0 (VGA)
PCI 00:03.0 (Ethernet)
Net "eth0"
PCI 00:04.0 (Ethernet)
Net "ens4"
PCI 00:05.0 (SCSI)
Block "vda"
PCI 00:06.0 (Other) |
I ended up merging this one 5bcd19a but thank you for the PR. Closing this one. |
Kinda dirty hack, but I modified scripts from meta llama repo to use cpu only.Export script now runs on CPU, no dependency on meta llama scripts.
It doesn't need torchrun, just python export_meta_llama_bin.py
md5sum of resulting file (someone with powerful gpu can check with results of original scripts):
55d11aa1dff5da98f9296e1a7ae47074 llama2_7b.bin
ff0e2f9a082fe1212f7bb323b241769b llama2_7b.bin - using dirty variant with additional scripts